2025 AIChE Annual Meeting
(687e) Developing a Classifier to Differentiate Binding and Non-Binding Antibody–Antigen Complexes Using Eppi Features
In this work, we are expanding on that prior study to identify the features and characteristics that are most important for distinguishing real antibodies from decoys. We began by curating a non-redundant database of antibody-protein complexes. Experimental structures of every antibody-protein complex were downloaded from the international ImMunoGeneTics information system (IMGT®) 3DStructure Database. The unique variable domains and antigens from each file were identified. Final structures were selected for inclusion in the non-redundant database on the basis of three criteria. First, each antibody that bound to a unique antigen was included. Second, if an antibody bound to the same antigen as another antibody, it must have at least one complementarity determining region (CDR) that differs in length from the other antibody. Finally, if the first two criteria were not met, then the antibodies must have at least five amino acid mutations in their CDRs from one another. After the database was curated, decoy complexes of the antibodies with their native antigens and with other antigens in the database were created using existing complex prediction tools, including HADDOCK, ZDOCK, and AlphaFold Multimer. This overall process resulted in a large database of real and decoy antibody-protein complexes for analysis.
We then developed and implemented a bespoke classifier to analyze the complexes. This classifier is highly similar to Random Forest Classifiers, with additional algorithmic consideration for reevaluating prior decisions and ensuring redundancy in selection criteria. The EPPI features were calculated for all complexes in the curated database and used to train the classifier. The output of this overall analysis are clusters of real antibody complexes that share a list of features while all decoy complexes are excluded by two or more of the features. This talk will discuss the curation of the database of complexes, the details of the classifier algorithm, and the feature spaces of the largest clusters, which reveal interesting details of critical mechanisms of how antibodies bind to proteins.