2025 AIChE Annual Meeting
(29b) Machine Learning Assessment of Factors Controlling Oil Production in Bakken Formation, Williston Basin, USA
Authors
Cumulative oil production depends on a set of interacting factors. To examine these factors in the Bakken formation, we used the Normalized Production Index (NPI, Sorkhabi and Panja, 2021) in units of STB/ft/month i.e., cumulative divided by lateral length and produced months. Individual NPI values for 2,093 horizontal wells drilled in Bakken are calculated. The data were normalized to a range of zero and one for machine learning applications.
The methodology employed multiple feature selection techniques, including Shapley additive explanations (SHAP), filter methods, wrapper methods, and embedded methods, combined through ensemble selection to comprehensively identify the most influential variables. Then, three machine learning algorithms were tested: (1) multilayer perceptron (MLP), (2) extreme gradient boosting (XGBoost), and (3) random forest (RF), each with hyperparameter optimization.
Feature importance analysis revealed that (1) total base water volume used for hydraulic fracturing” was found to have the largest impact, followed by (2 and 3) surface longitude and latitude of wells, (4) total proppant weight used for hydraulic fracturing, (5) true vertical depth (TVD), (6) base water volume per fracture stage, (7) proppant weight per fracture stage, and (8) number of fracture stages. It appears that total base water volume and total proppant play key roles. Well locations (surface longitude and latitude) also have a significant impact on NPI probably because they mark geological sweetspots. The reason the number of fracture stages ranked the lowest is because with increasing lateral length (inherent in NPI) we also have an increasing number of fracture stages, both of which impact the oil production positively.
In ML model comparison, RF delivered the highest prediction accuracy (R²: 0.653, NRMSE: 0.121), closely followed by XGBoost (R²: 0.652, NRMSE: 0.122). The MLP model performed considerably worse (R²: 0.528, NRMSE: 0.142), suggesting it could not adequately capture the complex nonlinear relationships between completion parameters and production outcomes. The strong performance of tree-based ensemble methods indicates their suitability for this application domain.
This research has several far-reaching implications for shale field development. First, it establishes NPI not only as a key production performance indicator but also as a reliable metric for forecasting well production. Second, the predictors we employed including coordinates, TVD, and well completion data are known factors prior to production. Finally, the strong predictive performance of the ML models enables operators to make reliable production forecasts at the completion stage of wells, potentially improving capital allocation decisions and field development planning. This methodology can be extended to other unconventional plays to optimize drilling and completion strategies based on data-driven insights rather than conventional rules of thumb.
References cited:
Lu, C., Jiang, H., Yang, J., Wang, Z., Zhang, M., & Li, J. (2022). Shale oil production prediction and fracturing optimization based on machine learning. Journal of Petroleum Science and Engineering, 217, 110900.
Luo, G., Tian, Y., Sharma, A., & Ehlig-Economides, C. (2019). Eagle Ford well insights using data-driven approaches. International Petroleum Technology Conference D021S026R003.
Sorkhabi, R., & Panja, P. (2021). Not all shales play the same game: comparative analysis of us shale oil formations by reverse engineering and petroleum systems. Unconventional Resources Technology Conference, 26–28 July 2021 (pp. 878-892). DOI 10.15530/urtec-2021-5660