The OSDU™ Data Platform – A Primer (2 of 2)

by Einar Landre

Lead Analyst, Equinor emerging digital technologies

Digitalisation

Until less than 60 years ago earth models as most other models was paper based. With the development of computers earth models have become digital with raw data and derived knowledge carved into software and databases.

Despite digital tooling earth science work practice has not changed much. Scientists collects data, build models and hypothesis, analyse and predict possible outcomes.

One thing that has changed is the amount of data. The dataset shared by Equinor for the retired Volve field consists of 40.000 files. Volve was a small field producing for a decade. To know what dataset can / should / could be used for what type of work is not trivial.

Another challenge is that each discipline has its own specialist tools emphasising different aspects of the dataset. This mean that two disciplines will struggle to synthesise their results at the end. Adding to the challenge the fact that models are tied to the tool and the individual users preferences and understanding.

The result is an individ centred, discipline specific tooling that make a holistic (trans disciplinary) view difficult if at all possible. Said in other words: we have tools the forest, tools for the threes and tools for the leaves, but no tool that allow us to study threes in context of the forest or leaves in context of a three. Philosophically speaking is this the result of applying reductionism on a complex problem.

The effect is fragmentation and inconsistency across individuals, tools, disciplines and organisational units leading to loss of knowledge, loss of trust and continuously rework as people struggle to build on each others work.

Fragmentation is a good starting point for the next topic that create a lot of pain when building digital data models, the question of what is one thing and when does a thing change so much it become a new thing.

One thing

According to William Kent in Data and Reality answering what is one thing forces us to explore three key concepts:

Oneness. What is one thing?
Sameness. When we say two things are the same, or the same thing? How does change affect identity?
Categories. What is it? In what categories do we perceive the thing to be? What categories fo we acknowledge? How well defined are they?

Oneness underlies the general ambiguity of words and we will use an example from the book regarding the word “well” as used in the files of an oil company.

In their geological database, a “well” is a single hole drilled in the surface of the earth, whether or not it produces oil. In the production database, a “well” is one or more holes covered by one piece of equipment, which has tapped into a pool of oil. The oil company had trouble integrating these databases to support a new application: the correlation of well productivity with geological characteristics.

This imply that the word well is ambiguous across contexts. Observe that the ambiguity lays with the understanding of the concept well. The production database might have used the term well for what a geologist might think of as a wellbore.

What we observe here is ambiguity across contexts. The production database might have used the term well for what a geo-scientist might think of as a wellbore. Reading along we find this:

As analyst and modellers, we face “oneness”, “sameness” and “categories”. Oneness means coming up with a clear and complete explanation of what we are referring to. Sameness means reconciling conflicting views of the same term, including whether changes (and what type of changes ) transform the term into a new term. Categories means assigning the right name to this term and determining whether it is an entity type, relationship, or attribute on a data model. Oneness, Sameness and Categories are tightly intertwined with one another.

The main takeaway from this is that the ambiguities that we find in a trans-disciplinary fields such as earth science will create problems if not properly addressed. These ambiguities has more to do with how disciplines thinks and express themselves than finding a digital representation. The challenge sits with the semantics of language.

The value of standardising terminology is seen in medicine and anatomy where every piece of the human body is given a latin name that is thought and used across disciplines.

Contextualisation

Domain Driven Design provide two architectural patterns; Bounded Context and Context Mapping that help software architects and data modellers to create explicit context and context relationships. Bounded contexts and context maps allows architects to practice divide and conquer without loosing the whole. Reconciling might prove to be much harder than first thought. The picture below shows practical use of how bounded contexts can be applied on the well problem described above.

By modelling the two applications as bounded context it become clear that reconciliation will require use of a new context as forcing one applications definition on the other will not work. Therefore is it better to create a new bounded context that reconcile the differences by introducing new terminology.

The role of a data platform

A data platforms store, manage and serve data to consumers while adhering to the governance rules defined for the data. A data platform is not a database, but it can be build using data base technology. A data platform must also address the problems that come from semantic ambiguity (what is one thing), support the timescales and complexities found in the physical world, and the reality that both the physical world as well as its digital representation change with time. In other words, the data platform must support the nature of scientific work.

Scientific work can be seen as a four step process:

Gather data, including deciding what data is needed
Analyse data, predict outcomes and device alternative courses of action
Perform the most attractive course of action
Monitor effects, gather more data and repeat

This is the approach most professional professions use, it be medical doctors, car mechanics, detectives, geo-scientists, airline pilots, intelligence service officers, etc. Analysis can involve inductive, abductive reasoning dependent of context and problem at hand. Be aware that any reasoning, judgement and classification depends trustworthy data. Without trustworthy data, no trustworthy course of action.

The tale of a rock

The OSDU™ Data Platform support scientific workflows and the best way to describe what that entails is to provide an example, so here we go.

Below a picture of a rock that I found many years ago. At the first glance a fact sheet can be made capturing observable basic facts such as location found, density (weight/volume) and the camera (sensor) used to make the image. Further that it contain quartz (white), olivin (green) and a body that most likely is eclogite (lay man’s work).

Since the stone have several areas of interest and is a 3 dimensional object several images is needed. The easiest way to deal with the images is to place them in a cleverly named file folder and create fact sheet in a database, referencing the image folder and the physical archive location.

Areas of interests are photographed and classified in more detail. Where should the classification be stored? A separate database table could work, but as our needs grow what began as a simple thing become unmanageable as new needs emerge. One physical stone has become multiple independent digital artefacts and we are knee deep attempting to answer “what is one thing?”. Add to the problem that in the real world we do not have a handful of datasets, but thousands. Equinor’s Volve dataset counts 40.000 files.

The OSDU™ Data Platform is made for this kind of problem as can be seen in the next picture. The OSDU™ Data Platform includes a content store (files), a catalogue store (documents) and a search engine (Elastic).

In this case content is stable. The rock does not change, but our understanding of the rock derived from images might change. Lets say that we become interested in the main body of the rock. Then we can go back to the original, make a new sample and add it to the structure. We have derived new insight from existing data and captured it as shown in the diagram below.

The diagram show how a third sample has been added to the story and in addition a interpretation has been derived and linked to the area. The model captures the insights as it emerge. This is the core of the scientific work. It can be argued that the OSDU™ Data Platform is a knowledge capture system.

Knowledge capture

The OSDU™ Data Platform is made to capture subsurface knowledge as illustrated in the diagram below. Seismic is acquired for a geographical area through surveys. The end product from seismic surveys are seismic data files that are interpreted using specialist software tools and one of the findings can be a horizon.

Seismic is measured in the time domain and that implies that the depth of the horizon require correlation with markers from a well log as shown to the left of the diagram. Markers are made when the wellbore is drilled and can be backed by the cuttings that come out of the borehole.

There are two main takeaways from this diagram. Firstly, datasets in terms of log files, rock images and seismic images are contextualised by a domain specific catalogue structure. Secondly, as data is analysed and knowledge is derived in terms of markers and horizons, the insights are managed as data. This is by the way an example of how causal inference works out in practice.

As time goes by and new well bores are made, new seismic surveys conducted both the amount of raw data grows as does the derived knowledge from the data. This takes us to the two most important capabilities provided by the OSDU™ Data Platform, lineage aka provenance and immutability.

Lineage enables provenance, basically that we know what data was used as source for a horizon and marker. Provenance is the key to trustworthy information. As new data emerge, old insights are not deleted but replaced by a new instance that is linked to the previous instance. This mean that the platform can hold multiple copies of the horizon and markers in the diagram above and capture how the understanding of the underground has evolved over time.

The end

The reader should now have a feel with the basic capabilities provided by the OSDU™ Data Platform including some of its scientific fundament. It should also be clear that its more than a data platform. Its really better understood as a knowledge capture system that contextualises data into information that can be reasoned about and used to support decisions while maintaining provenance.

Further the OSDU™ Data Platform resolves som of the hardest parts of earth science and earth modelling as well as being faced with one of computer sciences hardes questions, what is one thing? These are topics that will be revisited.

Hopefully you as reader have enjoyed the journey and I can promise that more stuff will follow. Finally, I hope that the readers see the potential of both technology and approach for other sectors than earth science.

Einar Landre is a practicing software professional with more than 30 years’ experience as a developer, architect, manager, consultant, and author/presenter. Currently working for Equinor as lead analyst within its emerging digital technologies department. He is engaged in technology scouting, open source software development (OSDU) with a special interest for Domain Driven Design, AI and robotics. Before joining Equinor (former Statoil), Mr. Landre has held positions as consultant and department manager with Norwegian Bouvet, Development manager of TeamWide, technical adviser with Skrivervik Data (SUN & CISCO distributor) and finally software developer with Norsk Data where he implemented communication protocols, operating systems and test software for the international space station.