2025 AIChE Annual Meeting

(29b) Machine Learning Assessment of Factors Controlling Oil Production in Bakken Formation, Williston Basin, USA

Authors

Yeonpyeong Jo - Presenter, Inha University
Palash Panja, University of Utah
Rasoul Sorkhabi, University of Utah
Shale oil production has increased by approximately 1.5 times since the COVID-19 pandemic, from 6 million barrels of oil per day in 2021 to 9 million BOPD in 2024. Despite oil price volatility, operators continue drilling new wells with deeper vertical depths or longer laterals. New field developments require better production forecasting methods, which in turn, depend on quantitatively understanding factors that control oil production from horizontal wells in tight formations. Complexities around these factors render physics-based modeling very challenging and create opportunities for data-driven approaches. A number of researchers have used machine learning (ML) methods to predict oil production from unconventional plays. For example, utilizing data from numerical simulations, Lu et al. (2022) employed deep neural networks for forecasting shale oil production rates (R²: 0.86-0.87). Shale plays vary greatly in their production due to both geological and well completion factors (Sorkhabi et al., 2021). In this study, we conducted data-driven ML modeling for the Bakken play in the Williston Basin to predict oil production as influenced by various factors.

Cumulative oil production depends on a set of interacting factors. To examine these factors in the Bakken formation, we used the Normalized Production Index (NPI, Sorkhabi and Panja, 2021) in units of STB/ft/month i.e., cumulative divided by lateral length and produced months. Individual NPI values for 2,093 horizontal wells drilled in Bakken are calculated. The data were normalized to a range of zero and one for machine learning applications.

The methodology employed multiple feature selection techniques, including Shapley additive explanations (SHAP), filter methods, wrapper methods, and embedded methods, combined through ensemble selection to comprehensively identify the most influential variables. Then, three machine learning algorithms were tested: (1) multilayer perceptron (MLP), (2) extreme gradient boosting (XGBoost), and (3) random forest (RF), each with hyperparameter optimization.

Feature importance analysis revealed that (1) total base water volume used for hydraulic fracturing” was found to have the largest impact, followed by (2 and 3) surface longitude and latitude of wells, (4) total proppant weight used for hydraulic fracturing, (5) true vertical depth (TVD), (6) base water volume per fracture stage, (7) proppant weight per fracture stage, and (8) number of fracture stages. It appears that total base water volume and total proppant play key roles. Well locations (surface longitude and latitude) also have a significant impact on NPI probably because they mark geological sweetspots. The reason the number of fracture stages ranked the lowest is because with increasing lateral length (inherent in NPI) we also have an increasing number of fracture stages, both of which impact the oil production positively.

In ML model comparison, RF delivered the highest prediction accuracy (R²: 0.653, NRMSE: 0.121), closely followed by XGBoost (R²: 0.652, NRMSE: 0.122). The MLP model performed considerably worse (R²: 0.528, NRMSE: 0.142), suggesting it could not adequately capture the complex nonlinear relationships between completion parameters and production outcomes. The strong performance of tree-based ensemble methods indicates their suitability for this application domain.

This research has several far-reaching implications for shale field development. First, it establishes NPI not only as a key production performance indicator but also as a reliable metric for forecasting well production. Second, the predictors we employed including coordinates, TVD, and well completion data are known factors prior to production. Finally, the strong predictive performance of the ML models enables operators to make reliable production forecasts at the completion stage of wells, potentially improving capital allocation decisions and field development planning. This methodology can be extended to other unconventional plays to optimize drilling and completion strategies based on data-driven insights rather than conventional rules of thumb.

References cited:

Lu, C., Jiang, H., Yang, J., Wang, Z., Zhang, M., & Li, J. (2022). Shale oil production prediction and fracturing optimization based on machine learning. Journal of Petroleum Science and Engineering, 217, 110900.

Luo, G., Tian, Y., Sharma, A., & Ehlig-Economides, C. (2019). Eagle Ford well insights using data-driven approaches. International Petroleum Technology Conference D021S026R003.

Sorkhabi, R., & Panja, P. (2021). Not all shales play the same game: comparative analysis of us shale oil formations by reverse engineering and petroleum systems. Unconventional Resources Technology Conference, 26–28 July 2021 (pp. 878-892). DOI 10.15530/urtec-2021-5660