2025 AIChE Annual Meeting

(29b) Machine Learning Assessment of Factors Controlling Oil Production in Bakken Formation, Williston Basin, USA

Checkout Do you already own this? Log in to access this content.

Pricing

Individuals

AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	225.00
Non-Members	225.00

Authors

Yeonpyeong Jo - Presenter, Inha University

Palash Panja, University of Utah

Rasoul Sorkhabi, University of Utah

Milind Deo

Shale oil production has increased by approximately 1.5 times since the COVID-19 pandemic, from 6 million barrels of oil per day in 2021 to 9 million BOPD in 2024. Despite oil price volatility, operators continue drilling new wells with deeper vertical depths or longer laterals. New field developments require better production forecasting methods, which in turn, depend on quantitatively understanding factors that control oil production from horizontal wells in tight formations. Complexities around these factors render physics-based modeling very challenging and create opportunities for data-driven approaches. A number of researchers have used machine learning (ML) methods to predict oil production from unconventional plays. For example, utilizing data from numerical simulations, Lu et al. (2022) employed deep neural networks for forecasting shale oil production rates (R²: 0.86-0.87). Shale plays vary greatly in their production due to both geological and well completion factors (Sorkhabi et al., 2021). In this study, we conducted data-driven ML modeling for the Bakken play in the Williston Basin to predict oil production as influenced by various factors.

Cumulative oil production depends on a set of interacting factors. To examine these factors in the Bakken formation, we used the Normalized Production Index (NPI, Sorkhabi and Panja, 2021) in units of STB/ft/month i.e., cumulative divided by lateral length and produced months. Individual NPI values for 2,093 horizontal wells drilled in Bakken are calculated. The data were normalized to a range of zero and one for machine learning applications.

The methodology employed multiple feature selection techniques, including Shapley additive explanations (SHAP), filter methods, wrapper methods, and embedded methods, combined through ensemble selection to comprehensively identify the most influential variables. Then, three machine learning algorithms were tested: (1) multilayer perceptron (MLP), (2) extreme gradient boosting (XGBoost), and (3) random forest (RF), each with hyperparameter optimization.

Feature importance analysis revealed that (1) total base water volume used for hydraulic fracturing” was found to have the largest impact, followed by (2 and 3) surface longitude and latitude of wells, (4) total proppant weight used for hydraulic fracturing, (5) true vertical depth (TVD), (6) base water volume per fracture stage, (7) proppant weight per fracture stage, and (8) number of fracture stages. It appears that total base water volume and total proppant play key roles. Well locations (surface longitude and latitude) also have a significant impact on NPI probably because they mark geological sweetspots. The reason the number of fracture stages ranked the lowest is because with increasing lateral length (inherent in NPI) we also have an increasing number of fracture stages, both of which impact the oil production positively.

In ML model comparison, RF delivered the highest prediction accuracy (R²: 0.653, NRMSE: 0.121), closely followed by XGBoost (R²: 0.652, NRMSE: 0.122). The MLP model performed considerably worse (R²: 0.528, NRMSE: 0.142), suggesting it could not adequately capture the complex nonlinear relationships between completion parameters and production outcomes. The strong performance of tree-based ensemble methods indicates their suitability for this application domain.

This research has several far-reaching implications for shale field development. First, it establishes NPI not only as a key production performance indicator but also as a reliable metric for forecasting well production. Second, the predictors we employed including coordinates, TVD, and well completion data are known factors prior to production. Finally, the strong predictive performance of the ML models enables operators to make reliable production forecasts at the completion stage of wells, potentially improving capital allocation decisions and field development planning. This methodology can be extended to other unconventional plays to optimize drilling and completion strategies based on data-driven insights rather than conventional rules of thumb.

References cited:

Lu, C., Jiang, H., Yang, J., Wang, Z., Zhang, M., & Li, J. (2022). Shale oil production prediction and fracturing optimization based on machine learning. Journal of Petroleum Science and Engineering, 217, 110900.

Luo, G., Tian, Y., Sharma, A., & Ehlig-Economides, C. (2019). Eagle Ford well insights using data-driven approaches. International Petroleum Technology Conference D021S026R003.

Sorkhabi, R., & Panja, P. (2021). Not all shales play the same game: comparative analysis of us shale oil formations by reverse engineering and petroleum systems. Unconventional Resources Technology Conference, 26–28 July 2021 (pp. 878-892). DOI 10.15530/urtec-2021-5660

Breadcrumb

2025 AIChE Annual Meeting

(29b) Machine Learning Assessment of Factors Controlling Oil Production in Bakken Formation, Williston Basin, USA

Authors