2022 Annual Meeting
(432e) Fast Symbolic Regression with Constraints
Authors
There has been recent interest in applying mixed-integer nonlinear programming (MINLP) to the SR problem [2-3]. This approach has the advantage of returning the provably optimal symbolic regression model, and allowing for constraints on the model response to be enforced naturally. Unfortunately, the mathematical optimization approach requires significant computational effort and, as a result, exact MINLP approaches are limited to small problems.
We propose a SR algorithm that first relaxes the integrality constraints of the MINLP formulation in [3] to solve an inexpensive NLP, we then use the values of the relaxed integer variables to probabilistically assign variables, constants, or operators to nodes in the SR expression tree. We then solve another NLP to refine the resulting expressions. Our algorithm returns SR expressions with lower error than those found by solving the MINLP, yet orders of magnitude faster. In addition, our algorithm yields interpretable regression models with lower error than those returned by other, popular machine learning packages.
We also leverage the mathematical optimization component of our algorithm to enforce constraints on the model response surface, the first SR algorithm to do so. We show that constrained SR allows users to impose domain-knowledge to yield models that generalize better than those generated by unconstrained SR.
References
[1] Schmidt, M. and H. Lipson, Distilling free-norm natural laws from experimental data, Science, 324, 81-85, 2009.
[2] Cozad, A. and N. V. Sahinidis, A global MINLP approach to symbolic regression, Mathematical Programming, 170, 97-119, 2018.
[3] Kim, J., S. Leyffer and P. Balaprakash, Learning symbolic expressions: Mixed-integer formulations, cuts, and heuristics, https://arxiv.org/abs/2102.08351, 2021.