2020 Virtual AIChE Annual Meeting

(3s) Models and Solution Approaches to Large-Scale Multistage Stochastic Programs Under Endogenous and/or Exogenous Uncertainties

Bio: I earned my Master's Degree at Carnegie Mellon University. I am currently a Doctoral Candidate at Auburn University, where I focuse on modeling and developing solution strategies for large-scale multistage stochastic programs with endogenous or/and exogenous uncertainties. I was the recipient of a FOCAPO/CPC Young Researcher Travel Grant (2017), a PSE2018 Young Researcher Award (2018), an Auburn University Harry Merriwether Fellowship (2018), and Invited Talks on 2019 AIChE Annual Meeting Computing and Systems Technology Division (CAST) Plenary Session (Talk: Bounds for Multistage Stochastic Programs Under Endogenous Uncertainties).

Research Interests: Stochastic Programming, Mathematical Modeling (Optimization), Machine Learning, Surrogate-Based Optimization.

Stochastic programming:

In this project, my research work focuses on developing models and solution algorithms for stochastic programming problems, especially addressing computational challenges associated with solving large scale multistage stochastic programs (MSSPs) under decision-dependent uncertainty. Uncertainties in optimization problems can be divided into two groups: exogenous, where the realization times and distributions of uncertain parameters are independent of decisions, and endogenous, where the decisions determine realization times and/or distributions of uncertain parameters [1]. The endogenous uncertainties can further be classified as Type I endogenous uncertainty, where decisions influence the probability distributions, and Type II endogenous uncertainty, where decisions impact the realization time of uncertainties. There are many engineering problems from chemical process industry, manufacturing and project management that can be effectively modeled as multistage stochastic programs. The scenarios represent the possible combinations of outcomes of uncertain parameters. The non-anticipativity constraints (NACs) are introduced to prevent the decisions anticipating the unrealized future outcomes. However, these models grow quickly and become computationally intractable for real-world size problems.

I constructed a library of MSSP models with Type II endogenous uncertainty. I also develped a multistage stochastic program model for artificial lift infrastructure planning of shale gas horizontal wells under production rate uncertainty (endogenous). With the addition of this model, the current library contains 12 MSSP models with Type II endogenous uncertainty. I compiled comparative solution statistics for these models using the appropriate state-of-the-art commercial solvers. Most MSSP models suffer from the exponential growth of scenarios and number of non-anticipativity constraints and are limited for real-world size problems, which large-scale models cannot be generated with currently available memories in most standard workstations and their optimal solutions cannot be obtained within reasonable time.

The study into the structure of the MSSP models in the library lead me to develop a suite of solution approaches that exploit this structure and rely on decomposition techniques to solve the space and time complexities associated with solving these problems. We developed GKDA (generalized knapsack problem-based decomposition) [2], AEEV (absolute expected value solution) [3], mLD (modified Lagrangian decomposition) [4], and RGKDA (relaxed generalized knapsack decomposition algorithm) [5] for efficiently solving MSSPs under both endogenous and exogenous uncertainties. GKDA generates and solves a series of basic individual knapsack problems based on the realizations of uncertainties, and determine the recourse actions along each knapsack sub-problems. AEEV is derived from the absolute expected solution value approach, and follows the nature of decision-making in sequence, which utilize already known information and take the expected results for unrealized information to determine current state decisions, and then realize new uncertainty, and repeat this process along the planning horizon. RKDA generates a tight dual bound by employing GKDA to solve a relaxed version of the original MSSP. Similar to the existing Lagrangian relaxation schemes, the mLD regards the non-anticipativity constraints (NACs) as complicating constraints, however utilizes the special structure of the non-anticipativity constraints (NACs) to formulate a tighter dual problem, which also eliminates half of the Lagrangian multipliers. AEEV, GKDA, and RKDA all implicitly incorporate the non-anticipativity constraints (NACs) in framework and avoid the computational complexity due to number of scenarios and non-anticipativity constraints (NACs).

We applied the proposed algorithms to solve three MSSP models from the library: (1) pharmaceutical R&D pipeline planning problem under uncertain outcomes [6], (2) process network synthesis problem under uncertain yields [7], and (3) artificial lift infrastructure planning problem under uncertain production rate [8]. The results revealed that the AEEV yielded primal bounds within 1% of the true solutions for all tested cases with up to 16,384 scenarios, and generated these implementable solutions up to four orders of magnitude faster than solving the original MSSP models. The proposed primal and dual bounding approaches obtain 4.66% relative gap for the largest instances of the pharmaceutical R&D pipeline planning problem with 1,048,576 scenarios, which cannot even be generated in CPLEX 12.6.3.

Surrogate-based optimization:

The random forest algorithm has been used successfully as a general method for classification and regression since its introduction in 2001[9]. Random forest models consist of a large number of individual uncorrelated decision trees as an ensemble and generate their predictions using the expected value of individual decision tree predictions[9]. It has been shown that these models can effectively fit nonlinear and complex interactions of input features and can be used in surrogate-based optimization to approximate models without closed analytic forms. In this contribution, I examine surrogate-based optimization of unconstrained nonlinear programming problems employing random forest models as surrogates. We introduce a MILP for a random forest model and specifically focus on developing solution approaches that rely on the decomposition of the original MILP exploiting its unique structure. The MILP of random forest contains the complicating variables xi, representing the same input value in dimension i for all individual uncorrelated decision trees. We introduce additional continuous variables zi,t as the input values for each tree t in dimension i with the equality xi = zi,t , which is similar to the initial non-anticipativity constraints of two-stage stochastic program models.

Exploiting this structure of the MILP model, I specifically developed five different decomposition approaches that are inspired by Sample Average Approximation (SAA), Lagrangian decomposition with cutting plane method (LD), Progressive Hedging (PH), Alternating Direction Method of Multipliers (ADMM), and Benders Decomposition (BD). To address and reduce the impact of extreme individual trees, I investigated clustering strategies using four of the decomposition approaches (SAA-group, LD-group, PHLD-group, ADLD-group). I applied the developed solution approaches (SAA, LD, PHLD, ADLD, BD, SAA-group, LD-group, PHLD-group, ADLD-group) to solve the MILPs of 95 trained RF models, which were developed to approximate the functions from Virtual Library of Simulation Experiments [10]. The computational experiments revealed that the developed approaches were able to obtain optimality gaps for certain tested functions within a few iterations.

Teaching Interests:

I am able to teach all core chemical engineering courses, with particular interest in teaching optimization, design, thermodynamics, and engineering math/programming. I also look forward to the opportunity to develop graduate courses in stochastic programming, optimization and mathematical modeling.

References

  1. Gupta, V., & Grossmann, I. E. (2011). Solution strategies for multistage stochastic programming with endogenous uncertainties. Computers & Chemical Engineering, 35(11), 2235-2247.
  2. Zeng, Z., & Cremaschi, S. (2019). A general primal bounding framework for large-scale multistage stochastic programs under endogenous uncertainties. Chemical Engineering Research & Design: Transactions of the Institution of Chemical Engineers Part A, 141.
  3. Zeng, Z., Christian, B., & Cremaschi, S. (2018). A generalized knapsack-problem based decomposition heuristic for solving multistage stochastic programs with endogenous and/or exogenous uncertainties. Industrial & Engineering Chemistry Research, 57(28), 9185-9199.
  4. Zeng, Z., & Cremaschi, S. (2018). A Relaxed Knapsack-Problem Based Decomposition Heuristic for Large-Scale Multistage Stochastic Programs. In Computer Aided Chemical Engineering (Vol. 43, pp. 519-524). Elsevier.
  5. Zeng, Z., & Cremaschi, S. (2018). A new Lagrangian relaxation approach for multistage stochastic programs under endogenous uncertainties. In Computer Aided Chemical Engineering (ESCAPE30), May 24-27, 2020, Milano, Italy.
  6. Goel, V., Grossmann, I. E. (2006). A class of stochastic programs with decision dependent uncertainty. Mathematical programming, 108(2), 355-394
  7. Colvin, M., & Maravelias, C. T. (2008). A stochastic programming approach for clinical trial planning in new drug development. Computers & Chemical Engineering, 32(11), 2626-2642.
  8. Zeng, Z., & Cremaschi, S. (2017). Artificial lift infrastructure planning for shale gas producing horizontal wells. In: Proceedings of the FOCAPO/CPC, Tuscan, AZ, USA, pp. 8–12
  9. Breiman L (2001) Random forests. Mach Learn 45:5–32.
  10. https://www.sfu.ca/~ssurjano/optimization.html