2025 AIChE Annual Meeting

(172a) AI-Driven Inverse Design of PFAS-Free Surfactants: A Multi-Scale Framework Integrating Quantum Chemistry, Thermodynamics, and Machine Learning

Authors

Daeun Shin, Ewha Womans University
Jonggeol Na, Carnegie Mellon University
Due to increasing concerns about the environmental and health risks associated with per- and polyfluoroalkyl substances, or PFAS, there has been a significant push toward identifying safer, sustainable alternatives. In response, we developed a computational workflow aimed at the rational design and screening of non-PFAS surfactants. This approach focuses primarily on two essential interfacial properties: surface tension and critical micelle concentration (CMC), both of which are key for evaluating surfactant performance. Surface tension, which is arguably the most critical property for developing competitive PFAS-free surfactants, remains underrepresented in public databases due to the high cost and complexity of experimental measurement. To build a reliable dataset suitable for transfer learning and inverse design, we collected SMILES representations of surfactant structures from literature and online sources, and labeled them through first-principles COSMO-RS calculations.

We assembled a broad library of candidate surfactants, including a diverse range of ionic and nonionic compounds. All calculations were performed at 298.15 K using COSMOtherm, in combination with flatsurf and vacuum-phase approximations, to compute gas–liquid interfacial tensions. Our methodology builds upon previous work in COSMO-based quantum chemical modeling—specifically the Conductor-like Screening Model, or COSMO-RS. The model we used was created to estimate liquid–liquid interfacial tension (IFT) and gas–liquid systems. We further adapted this model to support high-throughput screening by optimizing both the computational settings and the calculation flow. To facilitate this, we integrated a Python-based automation pipeline that manages input generation, COSMOtherm execution, and output extraction across multiple candidate structures in parallel. This significantly reduced the manual effort required for screening large chemical libraries. To estimate CMC values, we followed a thermodynamic approach previously introduced in COSMO-RS literature. First, We curated over 3000 surfactant SMILES using this method. Computational calculations were made for 950 surfactants at a mole fraction of 100, while a separate set of calculations was made for 1,300 surfactants at a specific mole fraction of 10-6, 1,200 for 10-5, and 1,000 for 10-4. To see how the surface tension changes with mole fraction, we calculated 100 values at equal intervals in the 10-10 to 10-1 range to track the trend, and found that the degree of decrease varied depending on several factors, such as ionicity and the structure of the functional group.

Because large-scale experimental surface tension data are scarce, especially for emerging surfactant candidates, the labeled dataset we generated fills an important gap. It also serves as a foundation for machine learning model development. In short, combining quantum chemistry, thermodynamic modeling, and machine learning gives us a practical way to explore PFAS alternatives without needing to run as many expensive or time-consuming lab experiments.