2021 Annual Meeting
(246c) Pathologies of Neural Networks As Models of Discrete-Time Dynamical Systems
Authors
An obvious remedy is to construct the architecture of the neural network so that it is able to directly learn the continuous-time systems (i.e. ResNet, and neural ODE [3]). We will also discuss other ways to deal with the different patholoogies, for example the use of RevNet architectures to avoid noninvertibility (which however does not avoid the wrong bifurcations) [4].
While fixed-timestep learned flow maps exhibit the pathologies listed above, this does not mean that they are completely inapplicable to continuous-time dynamics. In the second portion of this talk, we describe an approach in which we train a finite-timestep flow map on variable-timestep data, and then use automatic differentiation to create from this an approximation of the infinitesimal generator (the ODE right-hand-side) for the system. We demonstrate that this approach converges to the true ODEs for a number of test cases.
Finally, we consider the approach of using a loss template based on a standard numerical integration algorithm (such as Runge Kutta, or forward or backward Euler) to train a neural network to approximate the ODE directly. We demonstrate by analysis and example that the approximate ODE will differ from the truth systematically, depending on the algorithm used to template the neural network. This "mirror" backward error analysis is intuitively related to the forward error these algorithms entail in their solutions of initial value problems.
[1] Gicquel, N., Anderson, J. S., and Kevrekidis, I. G. (1998). Noninvertibility and resonance in discrete-time neural networks for time-series processing. Physics Letters A, 238(1), 8â18. doi:10.1016/s0375-9601(97)00753-6
[2] Rico-Martinez, M., Adomaitis, R. A., and Kevrekidis, I. G. (2000). Noninvertibility in neural networks. Computers and Chemical Engineering. 24, 2417-2433. doi:10.1016/s0098-1354(00)00599-8
[3] Chen, R.T., Rubanova, Y., Bettencourt, J. and Duvenaud, D. (2018). Neural ordinary differential equations. arXiv preprint. arXiv:1806.07366.
[4] Gomez, A.N., Ren, M., Urtasun, R., and Grosse, R. B. (2017). The reversible residual network: backpropagation without storing activations. arXiv preprint. arXiv:1707.04585.
Figure: frequency locking in the neural network that approximated the discrete-time Brusselator model. For each subgraph, the dynamics will start and end at the same state after n (e.g. n = 13 in the first subgraph) iterations. Connecting states (shown in red points) along with time will form a polygon with n vertices.