In an era where sustainable resource management is paramount, optimizing existing oil extraction methods has become crucial as new reservoir discoveries grow increasingly scarce [1]. Waterflooding remains one of the most widely implemented secondary recovery techniques due to its relative simplicity and cost effectiveness. However, this method comes with significant environmental concerns due to excessive water use and freshwater contamination, especially in regions already facing water scarcity, consuming approximately three barrels of water for every barrel of oil produced [2]. By enhancing the efficiency of waterflooding operations, we can substantially reduce water consumption, minimize environmental impact, extend reservoir lifespans, and improve economic outcomes simultaneously. Yet optimizing these operations presents formidable challenges due to the highly nonlinear nature of reservoir behavior and the numerous operational constraints that must be satisfied, necessitating innovative computational approaches. These constraints are essential for realistic optimization, as our previous work demonstrated that neglecting water-related constraints can significantly overestimate oil production and inflate economic benefits, emphasizing the need for constraint-aware methods [3,4].
This study introduces a constrained reinforcement learning (RL) framework for optimizing waterflooding operations while adapting to evolving feasible region boundaries in oil reservoirs. We focus on maximizing net present value (NPV) through direct policy search in a continuous action space, optimizing well-pressure control strategies while satisfying operational constraints including flowrate limits, platform capacity, and water-cut constraints that maintain economic viability. Due to the high-dimensional nature and computational expense of oil reservoir simulations, we developed surrogate models to efficiently guide the optimization process. Initially, we collected 2,000 operational configurations using Latin Hypercube Sampling, evaluating each to obtain flowrate information at injectors and producers. With this offline data, we trained two learning models: a deep feedforward neural network (FNN) to predict cumulative NPV with R2 exceeding 0.98, and a binary classifier that distinguishes feasible from infeasible operations [5,6]. While these initial models provide guidance, on-the-fly learning is crucial as static classifiers may misclassify unexplored regions as feasible, potentially leading to infeasible operational strategies. Our RL framework enables the agent to continuously adjust parameterized bottom-hole pressures (BHPs) at injection and production wells, receiving rewards proportional to economic outcomes for feasible operations and calibrated penalties for infeasible ones. These penalties incorporate both violation severity and improvement from previous states, creating gradient-guided search based on classifier feedback that directs the agent toward feasible regions. When the classifier identifies operations in low confidence regions near decision boundaries, we invoke the full simulator for verification. Newly validated data points are periodically collected and used to incrementally update the classifier [7], allowing the decision boundary to dynamically adapt as exploration progresses. This approach enables us to learn the feasible region iteratively, ultimately converging to operational strategies that satisfy all constraints while maximizing economic benefits.
References
[1] Muggeridge, A., Cockin, A., Webb, K., Frampton, H., Collins, I., Moulds, T., and Salino, P., Recovery rates, enhanced oil recovery and technological limits. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 2014, Volume 372, 20120320.
[2] Bailey, B., Crabtree, M., Tyrie, J., Elphick, J., Kuchuk, F., Romano, C., and Roodhart, L., Water control. Oilfield review, 2000, Volume 12, 30-51.
[3] Beykal, B., Boukouvala, F., Floudas, C.A., Sorek, N., Zalavadia, H. and Gildin, E., 2018. Global optimization of grey-box computational systems using surrogate functions and application to highly constrained oil-field operations. Computers & Chemical Engineering, 114, pp.99-110.
[4] Sorek, N., Gildin, E., Boukouvala, F., Beykal, B. and Floudas, C.A., 2017. Dimensionality reduction for production optimization using polynomial approximations. Computational Geosciences, 21, pp.247-266.
[5] Aghayev Z., Voulanas D., Gildin E. and Beykal B., 2025. Surrogate-assisted optimization of highly constrained oil recovery processes using classification-based constraint modeling. Industrial & Engineering Chemistry Research.
[6] Beykal, B., Aghayev, Z., Onel, O., Onel, M. and Pistikopoulos, E.N., 2022. Data-driven Stochastic Optimization of Numerically Infeasible Differential Algebraic Equations: An Application to the Steam Cracking Process. In Computer Aided Chemical Engineering (Vol. 49, pp. 1579-1584). Elsevier.
[7] Luo, Y., Yin, L., Bai, W. and Mao, K., 2020. An appraisal of incremental learning methods. Entropy, 22(11), p.1190.