2024 AIChE Annual Meeting
(578f) Lyapunov Neural ODE Control (L-NODEC) for Robust Policy Search in Nonlinear Systems
[2]. Solving continuous-time optimal control problems (OCPs) is generally chal-
lenging, especially when nonlinear dynamics and path constraints are present,
since they involve infinitely many time-varying decision variables. Various opti-
mization techniques to solve OCPs have been developed - direct methods that
discretize the time-varying functions and indirect methods that solve the nec-
essary conditions of optimality, but there is also growing interest in learning
methods that parameterize control policies using neural networks (NNs) [3].
They are particularly attractive due to their scalability for problems of higher
dimensions as well as their representation capacity due to the universal approx-
imation theorem [4].
We solve a continuous-time OCP with a NN control policy via neural ordi-
nary differential equations (NODEs) [5] since they replace the discrete nature
of hidden layers with a parameterized ODE that represents continuous depth
models. The perspective of treating function learning as a dynamical system
offers significant advantages in time-series modeling (e.g. [6], [7]). Though
NODEs are used in systems with unknown dynamics to simulatenously learn
and control dynamics (e.g. [8], [9]), we take a different approach by leveraging
known physics. Consequently, our NODE structure consists of only applying
the NN representation to the control policy, and we embed this into known dif-
ferential equations describing the temporal evolution of states [10], which can
be viewed as an example of the universal differential equation framework [11].
In particular, we are interested in the Mayer problem, a terminal cost continuous-
time OCP, with the goal of steering the system to a desired equilibrium point
[12]. We build upon the aforementioned “physics-embedded” NODE control
structure by incorporating Lyapunov theory in the policy’s loss [13] during learn-
ing, hence the name Lyapunov NODE control (L-NODEC). This is achieved by
defining the deviation of the system states from the desired equilibrium as an
exponentially stable control Lyapunov function (ES-CLF). The policy is learnt
such that the system dynamics satisfy the condition of exponential stability,
and this is achieved by quantifying the violation of the local invariance prop-
erty [14]. To address the issue of constraints, we explicitly enforce the input
constraints by appropriately parameterizing the policy via a sigmoid function
in the output layer. Path constraints are enforced as nonlinear constraints and
enforced as soft constraints via quadratic penalty functions in practice [15]. We
guarantee for the unconstrained OCP, L-NODEC is exponentially stable and
is capable of converging to equilibrium even without complete information on
terminal states. Furthermore, as a consequence of stability, a theoretical upper
bound for adversarial robustness can be established with respect to uncertainty
in initial conditions.
We demonstrate L-NODEC outperforms NODEC on a benchmark continuous-
time double integrator where the policy recommends alternative trajectories to
reach the desired terminal equilibrium state in lower inference time. Further-
more, its robustness to adversarial attacks also confirms the theoretical upper
bound on deviations in final states due to variance in initial conditions. In the
case of a constrained OCP, we also demonstrate the systematic tradeoff between
stability and constraint satisfaction, though constraint enforcement via penalty
functions are still an active area of research [16].
[1] M. Athans and P. L. Falb, Optimal control: an introduction to the theory
and its applications. Courier Corporation, 2007.
[2] F. L. Lewis, D. Vrabie, and V. L. Syrmos, Optimal control. John Wiley
& Sons, 2012.
[3] M. Hertneck, J. Kohler, S. Trimpe, and F. Allgower, “Learning an
approximate model predictive controller with guarantees,” IEEE Control
Systems Letters, vol. 2, no. 3, p. 543–548, Jul. 2018. [Online]. Available:
http://dx.doi.org/10.1109/LCSYS.2018.2843682
[4] A. R. Barron, “Universal approximation bounds for superpositions of a
sigmoidal function,” IEEE Transactions on Information Theory, vol. 39,
no. 3, pp. 930–945, 1993.
[5] R. T. Q. Chen, Y. Rubanova, J. Bettencourt, and D. K. Duvenaud, “Neural
ordinary differential equations,” Advances in Neural Information Process-
ing Systems, vol. 31, 2018.
[6] A. Rahman, J. Drgoˇna, A. Tuor, and J. Strube, “Neural ordinary differ-
ential equations for nonlinear system identification,” in Proceedings of the
American Control Conference, 2022, pp. 3979–3984.
[7] A. J. Linot, J. W. Burby, Q. Tang, P. Balaprakash, M. D. Graham, and
R. Maulik, “Stabilized neural ordinary differential equations for long-time
forecasting of dynamical systems,” Journal of Computational Physics, vol.
474, p. 111838, 2023.
[8] S. Bachhuber, I. Weygers, and T. Seel, “Neural ODEs for data-driven au-
tomatic self-design of finite-time output feedback control for unknown non-
linear dynamics,” IEEE Control Systems Letters, 2023.
[9] C. Chi, “Nodec: Neural ode for optimal control of unknown dynamical
systems,” arXiv preprint arXiv:2401.01836, 2024.
[10] I. O. Sandoval, P. Petsagkourakis, and E. A. del Rio-Chanona, “Neural odes
as feedback policies for nonlinear optimal control,” IFAC-PapersOnLine,
vol. 56, no. 2, pp. 4816–4821, 2023.
[11] C. Rackauckas, Y. Ma, J. Martensen, C. Warner, K. Zubov, R. Supekar,
D. Skinner, A. Ramadhan, and A. Edelman, “Universal differential equa-
tions for scientific machine learning,” arXiv preprint arXiv:2001.04385,
2020.
[12] A. E. Bryson, Applied optimal control: optimization, estimation and con-
trol. Routledge, 2018.
[13] I. D. J. Rodriguez, A. D. Ames, and Y. Yue, “Lyanet: A lyapunov frame-
work for training neural odes,” 2022.
[14] A. D. Ames, K. Galloway, K. Sreenath, and J. W. Grizzle, “Rapidly expo-
nentially stabilizing control lyapunov functions and hybrid zero dynamics,”
IEEE Transactions on Automatic Control, vol. 59, no. 4, pp. 876–891, 2014.
[15] R. M. Freund, “Penalty and barrier methods for constrained optimization,”
2004.
[16] T. Antony and M. J. Grant, “Path constraint regularization in optimal
control problems using saturation functions,” AIAA Atmospheric Flight
Mechanics Conference, 2018.