2024 AIChE Annual Meeting
(193d) Torchsisso: A Python Package for Explainable AI Using the Sure Independence Screening and Sparsifying Operator Method with GPU Support
Symbolic regression (SR) refers to an interesting class of explainable AI methods that look to identify optimal closed-form (nonlinear) expressions for a given target property (or output) from a set of input features that are possibly related to the target [3, 4]. Early work on SR focused on genetic algorithm-based methods [5], but more recent work has focused on sparse linear regression-based approaches. For example, the sure independence screening and sparsifying operator (SISSO) method [6] uses compressed sensing with feature expansion to perform SR. The feature expansion step proceeds by combining a set of primary features with a set of unary and binary operators until a large enough feature set is available (typically on the order of 107-1010 features). The SIS method [7] is used to identify a small set of promising features on which one can solve a full L0-regularized regression problem to construct the final set of descriptors. SISSO has been used to learn interpretable descriptors for many properties including phase stability [8], catalyst performance [9], and glass transition temperature [10].
Although powerful, there are some important challenges with current implementations of SISSO that have prevented its widespread use. First, the original SISSO repository [11] is implemented in Fotran, making it challenging for users to install and run (especially in cloud-based computing environments). Second, the feature expansion step in [11] has been hard-coded such that it cannot be directly modified. This is important because, as we show through a couple of simple examples, the potentially incomplete expansion can result in a failure to learn the true symbolic expression. Third, the combinatoric expansion of the feature space can be slow or even infeasible depending on the available set of memory. To address these issues, we introduce a new Python package, TorchSISSO [12], that implements an enhanced version of the SISSO method. We base our implementation off the open-source machine learning library Torch [13] so that all of the internal operations can be GPU-accelerated if desired. TorchSISSO is pip installable (i.e., pip install torchsisso) such that it can readily installed locally or in cloud environments. Through a series of examples, we show that TorchSISSO can discover physically relevant equations up to 18x faster (and with higher accuracy) than the original SISSO implementation. We also implement a novel filtering strategy that enables application of the method to problems with high-dimensional primary feature spaces (common in material discovery problems).
References:
[1] Hermann, J., DiStasio Jr, R. A., & Tkatchenko, A. (2017). First-principles models for van der Waals interactions in molecules and materials: Concepts, theory, and applications. Chemical Reviews, 117(6), 4714-4758.
[2] Angelov, P. P., Soares, E. A., Jiang, R., Arnold, N. I., & Atkinson, P. M. (2021). Explainable artificial intelligence: an analytical review. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 11(5), e1424.
[3] Aldeia, G. S. I., & de França, F. O. (2021, June). Measuring feature importance of symbolic regression models using partial effects. In Proceedings of the genetic and evolutionary computation conference (pp. 750-758).
[4] Wang, Y., Wagner, N., & Rondinelli, J. M. (2019). Symbolic regression in materials science. MRS Communications, 9(3), 793-805.
[5] Koza, John R. "Genetic programming as a means for programming computers by natural selection." Statistics and computing 4 (1994): 87-112.
[6] Ouyang, R., Curtarolo, S., Ahmetcik, E., Scheffler, M., & Ghiringhelli, L. M. (2018). SISSO: A compressed-sensing method for identifying the best low-dimensional descriptor in an immensity of offered candidates. Physical Review Materials, 2(8), 083802.
[7] Fan, J., & Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society Series B: Statistical Methodology, 70(5), 849-911.
[8] Schleder, G. R., Acosta, C. M., & Fazzio, A. (2019). Exploring two-dimensional materials thermodynamic stability via machine learning. ACS applied materials & interfaces, 12(18), 20149-20157.
[9] Han, Z. K., Sarker, D., Ouyang, R., Mazheika, A., Gao, Y., & Levchenko, S. V. (2021). Single-atom alloy catalysts designed by first-principles calculations and artificial intelligence. Nature communications, 12(1), 1833.
[10] Pilania, G., Iverson, C. N., Lookman, T., & Marrone, B. L. (2019). Machine-learning-based predictive modeling of glass transition temperatures: a case of polyhydroxyalkanoate homopolymers and copolymers. Journal of Chemical Information and Modeling, 59(12), 5013-5025.
[11] https://github.com/rouyang2017/SISSO
[12] https://pypi.org/project/TorchSisso/
[13] Collobert, R., Bengio, S., & Mariéthoz, J. (2002). Torch: a modular machine learning software library.