2025 AIChE Annual Meeting

(412a) Machine Learning Enabled Discovery of Novel Polymer Structures with Experimental Validation for Gas Separation Applications

Authors

Michael Sun, Massachusetts Institute of Technology
Minghao Guo, Massachusetts Institute of Technology
Wojciech Matusik, Massachusetts Institute of Technology
The design and testing of novel materials for gas separation membranes has historically relied on expert intuition, empirical testing, and synthetic iteration of polymer structures. This process is time consuming and expensive and has largely limited the development of new membranes. Herein, we present a machine learning model that generates novel polymer structures and accurately predicts their gas transport properties, as corroborated by experimental validation. Importantly, the model can learn using a small dataset of microporous polymers (114 examples) by integrating expert knowledge into a neurosymbolic architecture. We confirm the predictive accuracy of the model by experimentally synthesizing and characterizing three novel PIM-polyimide structures proposed by the model. We tested our membranes for several industrially relevant gas separations, including CO2/CH4, O2/N2, and H2/N2, and observed good agreement between predictions and measurements. Excitingly, a fraction of the new structures proposed by the model have separation properties predicted to surpass those of the small existing dataset as well as the most up-to-date upper bounds. This study demonstrates, for the first time, an AI-driven workflow that can draw from a small dataset and predict novel polymer structures and their transport properties with experimental validation. Ultimately, this study highlights the importance of integrating domain- area expertise with machine learning for generating and validating synthetically accessible structures with accurate transport property predictions. We believe this same blueprint for the integration of domain area expert knowledge inside a computational design workflow is particularly attractive for domains characterized by complex designs where there is not a large library of data from which to draw.