2021 Annual Meeting
(346n) Reinforcement Learning with Neural Feedback Policies
Authors
The control policy is composed of a single neural network which receives as input the state of the system and outputs the corresponding control actions. The controller is trained in a closed-loop fashion using gradient based optimization via discrete adjoint sensitivities from the dynamic model with respect to the neural network parameters. The technique architecture is inspired by the REINFORCE algorithm from reinforcement learning, but given the explicit availability of the system's, sensitivities may be directly used to calculate the gradient of the loss function over the parameter space of the controller.A discretize-then-optimize approach is used to avoid instability problems of alternatives, which leverages an underlying reverse automatic differentiation for the correct estimation of sensitivities.
Gradient based optimization is used directly over the parameterization of the neural network controller to minimize a running and terminal cost over a fixed time interval. As a result, we construct a policy that can handle continuous nonlinear optimal control problems in the same spirit but orders of magnitude more efficiently than with standard policy gradient learning whenever a dynamical model is available.
We test our proposed technique in challenging nonlinear optimal control problems from process engineering where the governing dynamical system is available.