2022 Annual Meeting
(284c) Hierarchical Graph-Based Representation Drives Prediction of Stapled Peptide Drug-like Properties
In this work, we design and validate a hierarchical, graph-based model to predict and optimize properties of stapled peptides and apply it to identify lead compounds for translation to in vitro models. Because the complex topology of stapled peptides is challenging to represent, we employed a message passing graph neural network (MP-GNN) to produce a machine interpretable, fully differentiable representation. First, canonical and non-canonical amino acids alike are represented as vectors, instantiated with quantum chemical descriptors like charge, hydrophobicity, and bond order. Then, information about local chemical environments is encoded by passing messages between atoms, updating the vector representations at each step. Next, because a peptideâs properties are a function of its amino acids, each amino acid is represented as a sum of its atom vectors and the same message passing process is repeated at the amino acid level. This encodes global information about the peptide, which theoretically includes information such as intramolecular NH3+ - COOH- salt bridges between charged amino acids and Ï-Ï interactions between aromatic amino acids.
We demonstrate the power of this model by designing stapled peptides towards the Bcl-2 proteins, or B cell lymphoma 2 proteins, which regulate apoptosis within cells and are often overexpressed in cancer cells. Because these proteins are highly related in sequence but play distinct roles in apoptosis regulation, methods and models that optimize specificity are in great demand. We designed a library of stapled peptides that inhibit Bcl-2 proteins and therefore induce cancer tells to undergo apoptosis, screened it for desired properties using the bacterial cell surface, and generated quantitative property labels using Next Generation Sequencing. These labels are combined with the unsupervised representation from the MP-GNN of stapled peptides to train and validate the model. Using this model, we can explore the sequence-function space of stapled peptides. Finally, lead compounds are identified by optimizing peptides to the frontier of idealized target properties. Importantly, the peptides identified through modeling had improved affinity and specificity compared to ones identified directly through sorting. The ability to improve affinity, specificity, and stability of stapled peptides demonstrates the value of a model that can handle proteins with geometry beyond naturally occurring proteins. This model could eventually lead to better prediction and generation of similar molecules, such as cyclic peptides or glycoproteins.