2018 AIChE Annual Meeting
(393e) Distributed Approximate Dynamic Programming (dADP) for Data-Driven Optimal Control of Nonlinear Systems
Authors
ADP is a data-driven optimal control strategy, where historical datasets are exploited to train the control policy and value function iteratively towards the optimal solution which satisfies the Bellmanâs principle of optimality. For systems with continuous (infinite) states, the optimality principle assumes a specific form of the Hamilton-Jacobi-Bellman (HJB) equations. For nonlinear input-affine systems, the HJB equations can be transformed such that the model functions can be substituted with some data information, so that by choosing suitable basis functions for the optimal control policy and the value function, the policy and value iterations can be approximately solved as a regression problem [3].
In this work, we propose a novel approach different from that of [3], which directly formulates the HJB equations as a nonlinear regression problem, so that the approximate control policy and value function can be directly obtained. The framework is also extended to the cases where input constraints are present. This formulation is suitable for solving ADP in a big-data setting where a centralized optimization exploiting all the data in the regression procedures is infeasible. Specifically, we employ the alternating direction of multipliers (ADMM) [4], which is the most widely used distributed optimization algorithm, as well as its accelerated version [5] to regress the parameters of the optimal control policy and value function. We call the resulting framework distributed adaptive dynamic programming (dADP) as it adaptively updates the parameters to approach the optimum throughout the distributed optimization iterations, and we will illustrate this method in a chemical reactor example.
References
[1] Hou, Z. S., & Wang, Z. (2013). From model-based control to data-driven control: Survey, classification and perspective. Inf. Sci., 235, 3-35.
[2] Lee, J. H., & Wong, W. (2010). Approximate dynamic programming approach for process control. J. Process Control, 20(9), 1038-1048.
[3] Luo, B., et al. (2014). Data-based approximate policy iteration for affine nonlinear continuous-time optimal control design. Automatica, 50(12), 3281-3290.
[4] Boyd, S., et al. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3(1), 1-122.
[5] Goldstein, T., et al. (2014). Fast alternating direction optimization methods. SIAM J. Imaging Sci., 7(3), 1588-1623.