2025 AIChE Annual Meeting

Machine Learning Models for Catalytically Relevant Transition Metal Carbide Surfaces

Transition metal carbides (TMCs) have wide-ranging applications in electrochemical and biochemical catalysis. Of particular interest is their potential role in reducing greenhouse gas emissions by activating carbon dioxide and producing useful byproducts. Since active sites and reaction energetics are highly surface-dependent, catalytic performance is governed not only by composition but also by the specific crystal facet and termination exposed. TMCs have been extensively studied through both experiments and density functional theory (DFT), to better understand their catalytic behavior and stability. However, higher Miller-index facets are often underexplored because structure optimizations are time- and resource-intensive. Our work aims to address the under-sampling of these higher miller index surfaces by developing and training a machine learning (ML) model that predicts the energy values for these surfaces. This approach helps identify thermodynamically favorable high-index facets and can inform targeted synthesis toward desirable surfaces.

Low-index (1–2) surfaces are used as training data for machine-learning (ML) models that predict the energies of higher-index (≥3) facets. Initially we generated a DFT dataset of MoC, WC, VC, and NbC slabs spanning multiple terminations. Structural and elemental descriptors such as the total number of atoms in the surface, cell area, and Bader charge are used as features for the model. Model parameters are extensively optimized, as they heavily influence the model’s prediction performance. Several models were studied, including Support Vector Regression (SVR), Kernel Ridge Regression (KRR), and Random Forest Regression (RFR). From initial tests, the SVR model currently yields an RMSE value of 0.088 eV/atom but still has visible discrepancies. To address this, additional structural and electronic features will be generated in order to better distinguish data points. Model performance will then be systematically assessed using multiple evaluation tools, including parity plots, energy distribution comparisons, and error metrics. Altogether, our initial results indicate that data-efficient ML models trained on low-index surfaces can extend to complex, higher-index TMC facets with reliable accuracy and provide a predictive framework to accelerate the discovery of target surfaces for sustainable catalysis.