2021 Annual Meeting
(305c) RGN2- Single-Sequence Protein Structure Prediction with Applications in Protein Design and Novel Biomaterials
Author
Recent advances in protein modeling (e.g., AlphaFold2) have made it possible to predict protein structures with high fidelity from alignments of homologous protein sequences by using significant computational resources. While groundbreaking, three outstanding challenges remain unaddressed by these systems: (i) prediction of structure from individual sequences, necessary for orphan proteins, de novo design, rapidly evolving proteins, and modeling genetic variation, (ii) fast prediction, necessary for protein design and whole-proteome analyses, and (iii) scientific understanding of the sequence-to-structure relationships that underpin protein folding. Here we report RGN2, an end-to-end differentiable system for predicting protein structure from single protein sequences. RGN2 maps protein sequences to latent representations learned by a self-supervised sequence modeling task, then uses these learned representations to predict protein structure in a differentiable manner. To improve accuracy, we augment RGN2 with physics-based refinement at the cost of additional computation. Without needing to derive protein sequence alignments, RGN2 provides a million-fold gain in prediction speed over the publicly-available trRosetta system when no physics-based refinement is used and an 30-fold gain with refinement. We assessed RGN2 accuracy by predicting structures of proteins with no homologous sequences availableâ196 natural and 35 de novo designed proteinsâand observe that RGN2 outperforms trRosetta in both instances despite trRosetta having been used to design the de novo proteins. To our knowledge this represents the first end-to-end differentiable system for predicting protein structure from individual sequences, devoid of any explicit form of evolutionary information, and provides an alternate route to accurate and fast protein structure prediction.