Currently there is intense focus in the Oil and Gas industry in developing or acquiring machine-learning augmented geoscience interpretation applications. There is a myriad of different reasons for the intensity of this focus and just as many opinions as to how machine learning will ultimately affect the future of geoscience in Oil and Gas, but one thing is certain: machine learning is here, and it’s here to stay.
Many ascribe the recent successes of Machine Learning to an increase in available compute power; it is however also a result of 25+ years of intensive scientific research and experiences gleaned from the investment of time and money into understanding and correctly describing geoscience problems associated with petroleum systems that contribute wisdom to the outputs of a machine learning process. Simply put, Machine Learning outputs have to be interpreted and understood in a geologic context to be useful.
Rather than debate the nuances of machine learning adaptation as we normally do, we have compiled a list of the top 10 geoscience concepts that machine learning -for all its strengths and benefits - is currently incapable of understanding.
1. Understanding Repeated/Missing Section
One of the first and most fundamental things geoscientists learn about log correlation is how to correctly identify and integrate repeated and/or missing section into their model. Many machine learning tools use unique identifiers to track well markers/tops, and due to the nature of machine learning algorithms these unique identifiers can only be used one time (in the same way that traditional seismic horizons can only have one pick per trace). Ostensibly, this is to prevent the network from getting “confused” between two data points that are identical in all respects except for their vertical position, but it nonetheless remains a huge deficiency of ML tools.
2. The Concept of Superposition
This one is hammered into geos from day one - things settle, and as they settle, they build up. The stuff on the bottom is older (with few exceptions) than the stuff on top, and that natural order of things is generally accepted as correct. This is so natural to the geoscientist that the rule is only conspicuous when it’s violated. It also happens to be something that gets very confusing for machine learning applications due to the random or windowed manner in which they access and analyze data. Some applications handle this better than others, but the hierarchical nature of stratigraphy doesn’t lend itself well to unsupervised machine learning processes.
3. Salt Emplacement and Mechanics
Take the concepts of superposition and throw in some the unpredictable buoyancy, composition, and movement of salt and you get the field we know as Salt Tectonics. You know what’s really bad at salt tectonics? Machine learning algorithms. Again - for many of the same reasons that superposition can be surprisingly complicated - machine learning algorithms just don’t do well with processing the physical concepts that govern salt emplacement and tectonism. As with all rules there are exceptions to this as well, such as Bluware’s Hierarchical Deep Learning (HDL) application that has logic which honors concepts like these hard-coded into it.
4. Lateral Velocity Changes and their Effect on Structure
Diagenesis. Low-grade metamorphism. Low-T Quartz Melting and Recrystallization. Lots of weird things happen to rocks when you start playing with the temperatures and pressures that the rocks are subjected to. Quite often, these effects will vary laterally along the same rock unit due to slight differences in burial depth, stress orientation, composition of hydrothermal influx, etc. These lateral changes in roc k physics obviously (to us) manifest themselves in the waveform. However, unless a large quantity of detailed information exists regarding the physical properties of rock units in an area of interest and how it relates to the seismic signal, machine learning applications can only tell there is a difference in waveform in that area. They cannot tell what it means, what is causing it, the effect(s) it will have on the quality of the seismic signal underneath it, or the impact it has on the presence or absence of an intact petroleum system.
5. Facies
Similar to but not quite the same as the previous point, lateral facies changes are another seemingly simple concept that is in reality quite challenging to integrate into a machine learning algorithm. Think of a classic turbidite system, characterized by a transition from a strongly confined, vertically stacked channel system to a broader weaker confined and laterally accreting channel system, and finally to the broad, thinly-bedded sand-shale sequences of the distributive lobe. That concept in itself seems simple to anyone reading this, but how do you teach these largely unique-to-a-location concepts to an algorithm?
This isn’t to say we haven’t been trying our hardest - Stratimagic and later Spectral decomposition tools are early examples of neural-network based applications. They delivered variable results, largely dependent on the type and quality of data provided for input. Turns out capturing the brain of a geoscientist in an algorithm (even a sophisticated one) is harder than expected.
Similar challenges can be found when discussing the source rock that later turned into a reservoir, determining volcanic input, and deciding what the transport distance and mechanism is for units of interest, or thousands of other things that require experience to put into a meaningful context.
6. Seismic Acquisition and Processing Artifacts
Collecting and introducing noise into a seismic signal is something that’s as old as seismic acquisition and processing itself. Several decades and many, many millions of dollars have gone into understanding artifacts and how to minimize their presence in the final stacked result. While prestack data interpretation methods are becoming more popular and by extension are allowing us to mitigate the effect of processing artifacts, being able to recognize and remove acquisition artifacts remains an important skill that Machine Learning tools can’t quite do as well as a human being. To most Machine Learning algorithms, anything that’s been labeled as relevant is relevant…this includes noise and artifacts. One can see now how labeling data for machine learning processes can become a very complicated process very quickly.
7. Compressional Tectonics
Things like overturned or vertical beds and velocity inversions cause a lot of issues in seismic imaging. Vertical bedding is virtually invisible in seismic reflection data, and the repetition of section introduces all sorts of wacky imaging issues due to velocity inversions and seismic dispersion effects. This doesn’t result in input data of sufficient quality to reliably use certain machine learning approaches, considering that even the processing of data in compressional regimes involves assumptions and a high degree of human input.
Compressional regimes are structurally more complex as extensional regimes as well, and require a good understanding of compressional tectonics and mechanisms. This complexity requires a model-driven approach to interpretation. This is an important point - because most hydrocarbon exploration and production activities focus on extensional basins, that is the sort of data that is used to design and refine machine learning tools. This means that these tools are optimized for use in extensional basins, and thus follow the logic and geoscience principles of extensional basins. Using them in compressional basins is something that is only now starting to be investigated and will require input and supervision from experienced geoscientists to produce meaningful outputs.
8. Reactivated Basins & Reworked Sediments
The concept of fault reactivation relates to basins with multiple episodes of sediment deformation and/or multiple tectonic phases. Preexisting fault planes formed during the extensional phase are reactivated and utilized during subsequent compressional phases. This tectonic reversal can repeat several times over the lifetime of the basin. Reactivation events form traps - one of the key elements of the petroleum system - and should be interpreted carefully. Structure identification and removal of younger deformation from the older strata requires multiple steps in interpretation of many surfaces that represent the geologic history of the basin. Another well-known concept to trained geoscientists, but one that introduces data quality and logic issues that machine learning applications are not currently designed to overcome.
9. Filtering Data
Geoscientists are a very visual group, so we’re used to filtering data to make it look prettier. Those same filtering processes can remove important information that is relevant to a machine learning tool. With this knowledge, we can use what we know about pre- and post-stack seismic filtering to enhance important information, or in other words optimize our data not for human visualization but for Machine Learning processes.
Automatic filtering of data can leave way too many decisions up to the application, so this is another thing that task that requires a human babysitter.
10. Data quality and confidence
Machine learning algorithms have a glaring weakness when it comes to geoscience data - not only seismic data but all types. One thing that is unique to an experienced geoscientist is the ability to objectively gauge the quality of data they are analyzing. Data quality can even change within a survey, something that is difficult to communicate to machine learning applications. If a machine learning application is told that data is relevant, there is no quality measurement being made - all relevant data is relevant, and the confidence level in that data is not taken into account. While human interpreters can integrate the data quality dimension into their models, most machine learning applications cannot (at least, not without human intervention and training of a different type). In short, humans understand data quality’s impact on a final model and are able to incorporate that into data evaluations; Machine learning applications do not have this ability, at least not at an unsupervised level (and not yet).
Today many geoscientists are strategizing about how they should retrain and what relevant skills are going to be required for Oil and Gas Geoscientists in the future. We think that fundamental knowledge of geologic principles should be communicated and rigidly honored during the design of Machine Learning tools; and the ability to communicate key principals to software developers is a growing-in-importance skill which properly trained geoscientist can excel in.