by Jane McConnell
OSDU Management Committee Member / O&G Data Architect, Teradata
My previous post here was a re-post of an old article I wrote about what’s been wrong with subsurface data management for the longest time. Since writing that, I have been waiting for the day when I can write a follow-up happy article about how we have successfully changed the world (of subsurface data management, at least).
And here I finally am, on the OSDU Marketplace blog, writing the first happy follow-up article. Reasons to be cheerful, indeed.
A side note on the title. This is sort of Part 3 (as the title says) of the blog series that started with “10 Things”, because in my next post I’m going to go back to the 2015 follow-up post to “10 Things”, and after that there will be a Part 4 where I explain how the OSDU approach to data management supports that too. But mainly, in all honesty, it’s because I couldn’t resist a reference to Ian Dury and the Blockheads https://www.youtube.com/watch?v=CIMNXogXnvE
Back to the main topic of the day. How are we changing the world of Subsurface Data Management with the OSDU Data Platform?
Let’s start at the beginning with Hate #1 from the original blog – “Data Silos”.
One of the things we have the opportunity to do when building out a new solution with cloud native technologies and scalable storage is that we can build one single ecosystem in which to store all of our data. Inside that one ecosystem we can use many different types of storage, or many instances of a type of storage, or even both. We can work with storage technologies that are built to scale – built to execute queries in parallel, built to store many replicas of the data, or both. A lot of the limitations that we are used to are no longer there.
Although I joked in Hate #6 that the Oil Industry didn’t have Big Data, much of our data does meet one of the definitions of Big Data, often attributed to John Mashey -“data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process data within a tolerable elapsed time”. I assume most of you can remember the days when we had to split seismic surveys into multiple SEGYs of less than 2GB each because that was the size of the whole (FAT16) file system on our Windows PCs? And then how we ran up against the 2GB limit later on in memory on 32-bit operating systems. Our data always seemed to be a bit bigger than the technology we had available to manage it.
Part of the reason that we built those data silos in the past was the valid reason that the kinds of data we were trying to store was too big, too slow or just too damn awkward to handle inside an average data storage solution, and so we had to pick a particular technology for each specific kind of data. That drove seismic data management solutions that were intrinsically linked to tape robots and proprietary RTL software solutions, and well log data management solutions with proprietary database technologies. But once we had built those separate data silos, we started to cause ourselves a world of pain with master data management (ensuring data was referenced to the correct wellbore, or seismic project, or other key entity), and we made it very hard to bring different types of data together again.
Now that we have a scalable data platform – a data ecosystem that can handle all types of large or badly shaped data – we can act differently.
Regardless of the “type” of data – a seismic survey, a well log in DLIS format, a PDF of a comp log, a RESQML file, a core photo, whatever – let’s put it all in the same place. In the data platform. Regardless of the way we may or may not use the data next, let’s make the first step to always land the data – all data – into a modern, scalable, cloud environment. And let’s design this place so it can serve as the System of Record for that data.
But let’s not stop there. Let’s index it – all of it – so we can find it again. Although we can choose which attributes we want to use to discover that data on a per-data-type basis, we can still design the platform to have one index to search.
And at this initial stage, let’s make sure that we deal with the very real concerns about our legal rights to use the data right up front, by tagging the data with the information we need to know about legal ownership, country of origin, security classification, export restrictions and other information we will need in future to manage that data legally and ethically.
By bringing all the data into the same storage platform like this, we have a great foundation for everything that comes next. And this is when it starts to get really interesting. By adhering to key data principles like “data is immutable”, “don’t forget what you already know”, “all raw data is preserved” – we can fulfil the need that drives Hate #3 “Library Style Data Management” without compromising our ability to do more with the data in the next stage. And what we can do next is to improve the data, enrich the data, crack open some of those industry standard formats, and create additional versions of the data that are easier to consume. We can link datasets together using common master data. We can use one kind of data to improve datasets of another kind of data. All while adhering to additional key data principles like “Improved data is new data” and “Data lineage is tracked”. This will go a long way to helping us address Hate #5 “Never fixing the data”, and if we do this right, we can finally start making some in-roads into dealing with Hate #7 “Decisions by PowerPoint”.
Let’s stop here for now – and in our next instalments, we’ll talk more about how we move forward. Next up though, we’ll time-travel back to 2015 again and see how I explained it then – with an approach you may have heard of called “aggregation of marginal gains”.
Jane McConnell Bio
Coming from an IT background, Jane fell into Upstream Oil & Gas with its specific data management needs back in 2000, and has been consulting, developing product, and implementing solutions in this area since. She has done time at both Landmark and Schlumberger – in R&D, product management, consulting, and pre-sales – and in one role or another, she has influenced subsurface data management projects for most major oil companies across Europe. She chaired the Education Committee for the European oil industry data management group ECIM, and regularly presents at Oil Industry data management events. Along the way she’s learned enough about geophysics, geology, and the world of hydrocarbon exploration and production to be dangerous.
For the last 8 years has been working in Teradata, a big data and analytics company, in their Oil and Gas Practice. She is also serving on the OSDU Management Committee.
Jane is Scottish and has a stereotypical love of whisky (especially Islay malts). She knits and reads almost to obsession.
You can contact her at jane.mcconnell@teradata.com, on the OSDU Slack channels, or connect on LinkedIn.