Breadcrumb
- Home
- Publications
- Proceedings
- 2025 AIChE Annual Meeting
- Computing and Systems Technology Division
- 10: Software Tools and Implementations for Process Systems Engineering I
- (593b) Regularized Symbolic Regression with Pystar
Here, we present recent advances in SR and PySTAR. In our work, we use regularized objectives including the Bayesian Information Criterion, as the model fitness metric, instead of the Sum of Squared Residuals (SSR) used in previous studies, to address the bias-variance trade-off of the STAR surrogates. Regularized objectives balance predictive accuracy and model complexity by penalizing the number of non-zero parameters through a regularization term that is added to the traditional SSR. This leads to the construction of models that are not only accurate, but also simple and interpretable. To demonstrate this regularization capability and its implementation in PySTAR, we build surrogates for critical minerals processes utilizing both SSR and regularized objectives, and compare them in optimization frameworks. We also perform a benchmarking study to compare STAR with other surrogate modelling techniques, including deep learning and regularization. Our results showcase that the regularized SR expressions are among the most accurate models, while their simplicity facilitates optimization and interpretability.
References
[1] Cozad, A. and Sahinidis, N. V. A global MINLP approach to symbolic regression. Mathematical Programming, 170:97–119, 2018.
[2] Sarwar, O. Algorithms for interpretable high-dimensional regression, Carnegie Mellon University, Pittsburgh, PA, 2022.
[3] Kim, M., Sarwar, O. and Sahinidis, N. V. STAR: Symbolic regression Through Algebraic Representations. Submitted, 2025.
[4] Jones, T. K. GPLearn: Genetic Programming in Python, with a scikit-learn inspired API. https://github.com/trevorstephens/gplearn, 2017.
[5] Burlacu, B., Kronberger, G., and Kommenda, M. Operon C++ an efficient genetic programming framework for symbolic regression. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, pp. 1562–1570, 2020.