Machine learning force fields (MLFFs) have the potential to transform computational catalyst discovery by significantly accelerating simulations compared to traditional electronic structure methods like Density Functional Theory (DFT). Yet, widespread industrial adoption remains challenging because the priorities of industrial researchers and the ML community often differ. Industrial practitioners demand high-fidelity predictions for specific reactions and catalytic materials, rely on deep expertise in kinetic modeling, and require accuracy on par with DFT—all within strict computational limits. In contrast, the ML community typically leverages massive datasets—sometimes encompassing hundreds of millions of adsorption energies—to build foundational MLFFs, occasionally sacrificing electronic structure precision and relegating catalyst dynamics or reaction kinetics to a secondary role.
This presentation will explore fine-tuning strategies that bridge the gap between foundational models and the accuracy required for industrial heterogeneous catalysis. As a case study, we will discuss our recently released AQCat25 dataset, which incorporates fidelity improvements and previously missing spin polarization effects for heterogeneous catalysts. We will highlight data generation techniques that prioritize underrepresented transition-state information, employing automated kinetic modeling to strategically select new data points. We will also investigate model tuning approaches, such as multi-fidelity schemes, which prevent catastrophic forgetting and maintain overall model capabilities during adaptation to specialized catalytic systems. These approaches also integrate innovative, gradient-informed data selection methods to optimize performance with minimal computational overhead.