2024 AIChE Annual Meeting

(629i) Transferable Protein Sidechain Backmappings Amenable to Reweighting

While multiscale modeling of biomolecules has become a critical component in exploring their structure and self-assembly, backmapping from coarse-grained (CG) to fine-grained (FG), or atomistic, representations presents a challenge, despite recent advances through machine learning. A major hurdle, especially for those strategies utilizing machine learning, is that most backmappings can only approximately recover the atomistic ensemble of interest. We demonstrate the conditions for which backmapped configurations may be reweighted to exactly recover the desired atomistic ensemble. By then training separate decoding models for each sidechain type, we develop an algorithm based on conditional normalizing flows and geometric algebra attention that autoregressively proposes backmapped configurations. Critical for reweighting, our trained models include all hydrogen atoms in the backmapping and make probabilities associated with atomistic configurations directly accessible. Further, the models are sensitive to a sidechain’s local environment and are transferable to any protein sequence. We anticipate that our models will be particularly useful for proposing Monte Carlo moves for simultaneously rearranging entire sidechains, or sets of sidechains, in addition to producing novel all-atom full protein configurations from CG models. We also demonstrate, however, that reweighting is extremely challenging despite state-of-the-art performance on recently developed metrics and generation of configurations with low energies in atomistic protein force fields. Through detailed analysis of configurational weights, we demonstrate that machine-learned backmappings must not only generate configurations with reasonable energies, but also correctly assign relative probabilities under the generative model. These are broadly important considerations in generative modeling of atomistic molecular configurations.