Improved well log classification using semisupervised Gaussian mixture models and a new hyper-parameter selection strategy. (July 2020)
- Record Type:
- Journal Article
- Title:
- Improved well log classification using semisupervised Gaussian mixture models and a new hyper-parameter selection strategy. (July 2020)
- Main Title:
- Improved well log classification using semisupervised Gaussian mixture models and a new hyper-parameter selection strategy
- Authors:
- Dunham, Michael W.
Malcolm, Alison
Welford, J. Kim - Abstract:
- Abstract: Well log classification, the process of mapping well log measurements to lithofacies identified from core samples, is a common procedure in the oil and gas industry. Manually assigning lithofacies to the wire-line log measurements without core can be time consuming, and can also introduce a bias. Supervised machine learning algorithms are commonly used to automate this process, but they are prone to overfitting when the training data are scarce, which is common for well log classification problems. Semisupervised machine learning algorithms are designed for classification problems with minimal training data, and we adopt a semisupervised Gaussian mixture model (ssGMM) method to solve this problem. The dataset we consider for our study is from a machine learning competition held in 2016 and we simulate a semisupervised scenario by assuming only one out of the ten wells is the labeled data. We apply ssGMM to this well log dataset and compare its performance to the supervised method that was the winner of this competition, XGBoost. To try and improve the performance of both ssGMM and XGBoost, we also introduce a new hyper-parameter selection strategy that simultaneously uses the mean and standard deviation cross-validation scores, compared to the default procedure that only utilizes the mean cross-validation scores. Our results indicate that ssGMM is able to slightly outperform XGBoost in our semisupervised context, which supports the suggestion that semisupervisedAbstract: Well log classification, the process of mapping well log measurements to lithofacies identified from core samples, is a common procedure in the oil and gas industry. Manually assigning lithofacies to the wire-line log measurements without core can be time consuming, and can also introduce a bias. Supervised machine learning algorithms are commonly used to automate this process, but they are prone to overfitting when the training data are scarce, which is common for well log classification problems. Semisupervised machine learning algorithms are designed for classification problems with minimal training data, and we adopt a semisupervised Gaussian mixture model (ssGMM) method to solve this problem. The dataset we consider for our study is from a machine learning competition held in 2016 and we simulate a semisupervised scenario by assuming only one out of the ten wells is the labeled data. We apply ssGMM to this well log dataset and compare its performance to the supervised method that was the winner of this competition, XGBoost. To try and improve the performance of both ssGMM and XGBoost, we also introduce a new hyper-parameter selection strategy that simultaneously uses the mean and standard deviation cross-validation scores, compared to the default procedure that only utilizes the mean cross-validation scores. Our results indicate that ssGMM is able to slightly outperform XGBoost in our semisupervised context, which supports the suggestion that semisupervised algorithms are more appropriate in low training data situations. We also show that our new hyper-parameter selection technique selects hyper-parameters for ssGMM that perform better on the testing data, but the performance is mixed for XGBoost. Highlights: Implement a semisupervised Gaussian mixture models method for well log classification. Introduce a new hyper-parameter selection strategy that directly includes standard deviations. Simulate a realistic scenario with limited well log training data. Semisupervised Gaussian mixture models slightly outperforms XGBoost in this scenario. … (more)
- Is Part Of:
- Computers & geosciences. Volume 140(2020)
- Journal:
- Computers & geosciences
- Issue:
- Volume 140(2020)
- Issue Display:
- Volume 140, Issue 2020 (2020)
- Year:
- 2020
- Volume:
- 140
- Issue:
- 2020
- Issue Sort Value:
- 2020-0140-2020-0000
- Page Start:
- Page End:
- Publication Date:
- 2020-07
- Subjects:
- Lithofacies -- Well logs -- Semisupervised -- Classification -- Hyper-parameter selection
Environmental policy -- Periodicals
550.5 - Journal URLs:
- http://www.sciencedirect.com/science/journal/00983004 ↗
http://www.elsevier.com/journals ↗ - DOI:
- 10.1016/j.cageo.2020.104501 ↗
- Languages:
- English
- ISSNs:
- 0098-3004
- Deposit Type:
- Legaldeposit
- View Content:
- Available online (eLD content is only available in our Reading Rooms) ↗
- Physical Locations:
- British Library DSC - 3394.695000
British Library DSC - BLDSS-3PM
British Library HMNTS - ELD Digital store - Ingest File:
- 13411.xml