2025 AIChE Annual Meeting
(59d) Machine Learning-Based Probabilistic Backmapping for Multiscale Modeling of Phenylalanine Dipeptide Assembly
In this work, we present a machine learning-based backmapping approach that reconstructs atomistic structures from CG representations. Our decoder model integrates geometric algebra attention, masked normalizing flows, and neural network architectures to learn from atomistic trajectories and predict full atomic coordinates from CG inputs. The model is trained on atomistic simulation trajectories and learns to predict full atomic coordinates from CG positions. We have trained separate models to predict either Cartesian (XYZ) or internal (bond-angle-torsion, BAT) coordinates. Comparing predicted atomistic structures with the simulated ones shows that our trained model can reproduce most of the essential structural features, bond lengths, angles, and dihedrals of the FF peptide. BAT coordinates improve model performance by providing structured, periodic representations of molecular geometry. Mutual information analysis reveals that the predicted structures retain the same correlation patterns between degrees of freedom as observed in atomistic simulations, indicating the model captures meaningful interdependence among molecular components. Though focused on FF peptides in this work, our approach is generalizable to other molecular systems and proteins. It offers a probabilistic framework for recovering atomistic detail that can enable a better understanding of molecular assembly.