Phenylalanine dipeptide (FF) exhibits polymorphic self-assembly behavior and forms nanostructures such as nanotubes and vesicles via distinct assembly mechanisms. While numerous promising applications exist in nanotechnology and biomedicine, understanding how these structures emerge requires insight into the molecular-level interactions that govern their formation. Multiscale simulations are essential for capturing the complete description of self-assembly dynamics as these processes span multiple lengths and time scales, from fast atomic fluctuations to slow mesoscale organization. Coarse-grained (CG) molecular simulations help study these systems in simulations as they offer computational efficiency, but they lose critical atomistic structural information. A key challenge is accurately reconstructing atomistic structures from CG representations (backmapping). While regular backmapping offers a means to recover atomistic detail, these approaches often fail to capture molecular systems' inherent randomness and conformational variability.
In this work, we present a machine learning-based backmapping approach that reconstructs atomistic structures from CG representations. Our decoder model integrates geometric algebra attention, masked normalizing flows, and neural network architectures to learn from atomistic trajectories and predict full atomic coordinates from CG inputs. The model is trained on atomistic simulation trajectories and learns to predict full atomic coordinates from CG positions. We have trained separate models to predict either Cartesian (XYZ) or internal (bond-angle-torsion, BAT) coordinates. Comparing predicted atomistic structures with the simulated ones shows that our trained model can reproduce most of the essential structural features, bond lengths, angles, and dihedrals of the FF peptide. BAT coordinates improve model performance by providing structured, periodic representations of molecular geometry. Mutual information analysis reveals that the predicted structures retain the same correlation patterns between degrees of freedom as observed in atomistic simulations, indicating the model captures meaningful interdependence among molecular components. Though focused on FF peptides in this work, our approach is generalizable to other molecular systems and proteins. It offers a probabilistic framework for recovering atomistic detail that can enable a better understanding of molecular assembly.