2024 AIChE Annual Meeting
(368bd) Tying It All Together: Multimodal Learning in Catalyst and Inorganic Crystal Exploration
Authors
The application of machine learning (ML) techniques to model catalysts and crystals has become increasingly popular due to its ability to surrogate computationally expensive quantum chemistry-based calculations. ML models are particularly effective at predicting properties such as energy and forces and can also be utilized to generate structures. Graph representations are among the most prevalent and robust representations, as they capture the connectivity between atoms in materials. However, graph representations are not a one-size-fits-all solution. Obtaining graph representations of atomic entities requires accurate atomic coordinates, which are often difficult to ascertain. For example, metadata such as possible compositions or characterization data can be available for new materials, but determining the exact atomic coordinates of newly proposed catalysts and crystals remains challenging. Additionally, graphs have limitations in capturing lattice properties, as they typically do not explicitly incorporate lattice information. To address these limitations, we incorporate additional modalities such as language descriptions and XRD spectroscopy data of target materials alongside their graph representations. This multimodal approach aims to enhance property prediction and structure generation performance. Our research can be divided into three main parts. Firstly, we developed a language model that predicts the adsorption energy of catalysts with accuracy comparable to conventional GNN models like CGCNN and SchNet. Secondly, by applying multimodal learning using both graph and language representations, we achieved a 10% reduction in Mean Absolute Error (MAE) compared to the language-only approach in adsorption energy prediction. Lastly, we incorporated XRD spectra as an additional modality alongside graph representations for inorganic crystals. This integration further improved the performance of both prediction and structure generation. As no single modality can universally capture every aspect of target materials, multimodal learning leverages the strengths of each modality to compensate for their individual incompleteness. Moreover, this approach enables the prediction and structure generation from text-based metadata and characterization data, enhancing practical applicability in real-world applications.