We present an extension to the OR-Gym Python package [1] for multi-echelon inventory management [2]. The inventory management module in the package builds upon the multi-echelon inventory management problem described in Glasserman and Tayur [3] to address the inventory replenishment decisions in make-to-order supply chains. The platform allows modeling general supply chains with serial, assembly, distribution, or tree topologies [4]. The supply chain model includes lead times between nodes in the network, production capacity limits, and inventory holding limits. Nodes in the network can either be inventory holding nodes or manufacturing nodes. Three optimization approaches are presented in the context of a single product, multi-period centralized system in which a single retailer is subject to an uncertain stationary consumer demand, with either backlogging or lost sales for unfulfilled demand. The three paradigms used to optimize the daily inventory replenishment requests at each node are (1) deterministic linear programming, (2) multi-stage stochastic linear programming, and (3) reinforcement learning. A rolling horizon implementation is used for each approach. The deterministic model assumes the average demand at each period. The multi-stage stochastic model uses three realizations of the uncertain demand (low, mean, and high) for the first six stages, and assumes a deterministic system for the remaining optimization window. The scenario generation minimizes the Wasserstein-1 distance between the probability distributions of the scenario tree and the true demand distribution [5]. The reinforcement learning model relies on Proximal Policy Optimization (PPO), which has two neural networks that act as actor and critic, respectively [6]. The performance of the three methods is compared and contrasted in terms of profit (reward), service level, and inventory profiles. The results indicate that, of the three approaches, stochastic modeling yields the largest increase in profit. Reinforcement learning creates more balanced inventory policies that would potentially respond well to network disruptions. Furthermore, deterministic models perform well in determining dynamic reorder policies that are comparable to reinforcement learning in terms of their profitability.
[1] C. D. Hubbs, H. D. Perez, O. Sarwar, N. v. Sahinidis, I. E. Grossmann, and J. M. Wassick, âOR-Gym: A Reinforcement Learning Library for Operations Research Problems,â arXiv, Aug. 2020.
[2] A. J. Clark and H. Scarf, âOptimal Policies for a Multi-Echelon Inventory Problem,â Management Science, vol. 6, no. 4, pp. 475â490, Jul. 1960.
[3] P. Glasserman and S. Tayur, âSensitivity analysis for base-stock levels in multiechelon production-inventory systems,â Management Science, vol. 41, no. 2, pp. 263â281, 1995.
[4] D. Simchi-Levi and Y. Zhao, âPerformance evaluation of stochastic multi-echelon inventory systems: A survey,â Advances in Operations Research, vol. 2012. 2012.
[5] R. Hochreiter and G. C. Pflug, âFinancial scenario generation for stochastic multi-stage decision processes as facility location problems,â Annals of Operations Research, vol. 152, no. 1, pp. 257â272, 2007.
[6] J. Schulman, P. Moritz, S. Levine, M. I. Jordan, and P. Abbeel, âHigh-dimensional continuous control using generalized advantage estimation,â arXiv, Jun. 2016.