Research Interests
Computational Protein Design, Drug Discovery, Bioinformatics, Molecular Dynamics Simulations, Artificial Intelligence and Machine Learning
- Expanding protein function modeling using dynamics datasets and ensemble representations
- Developing novel computational pipelines for one-shot protein design targeting therapeutics, sensing, and other applications
- Leveraging deep learning architectures to create optimized solutions to problems using proteins
Developing Engineering Principles for Computational Protein Design
Recent years have seen a rapid rise in computational and machine learning (ML) approaches for protein design. While these methods have improved experimental success rates, challenges remain in accurately distinguishing true protein interfaces and providing accurate functional predictions. Additionally, a key limitation is the lack of mechanistic interpretability, which restricts broader adoption outside the training domain. During my Ph.D. at Auburn University, I focused on addressing these challenges by developing interpretable, efficient, and accurate computational pipelines for protein design. Doing so, I gained expertise using tools such as molecular dynamics, machine and deep learning for protein design, protein docking, large dataset integration, and scientific software development. Additionally, I have grown softer skills such as interdisciplinary collaboration, mentoring, and effective scientific communication.
Antibodies, as prototypical binding proteins, served as the model system to construct a biologically meaningful feature space describing protein interfaces. Using short molecular dynamics (MD) simulations of 20 antibody–protein complexes, we identified interaction features relevant to binding, including salt bridges, hydrogen bonds, and hydrophobic contacts. We found that only hydrogen bonds where both residues are stabilized in the complex are likely to persist and significantly contribute to binding. This observation aligns with the principle that prepaying entropic penalties enables flexible but enthalpically favorable interactions to meaningfully impact affinity.
We used these findings to define a new set of Expected Persistent Pairwise Interaction (EPPI) features. A random forest classifier trained on antibody docking poses showed that adding EPPI features to conventional macromolecular metrics—interaction energy, buried surface area, and shape complementarity—reduced false positives by two- to five-fold across various classification thresholds.
Building on this, we hypothesized that EPPI features could also improve ML-based prediction of binding affinity changes (∆∆G) upon mutation. Using the SKEMPI v2.0 database, we modeled mutant structures and computed changes in EPPI features. A random forest regressor trained on these features achieved state-of-the-art performance, with Pearson correlations of 0.723 (cross-validation) and 0.716 (blind validation).
Currently, I am translating these insights into a tool for rational protein interface redesign. Feature analysis has revealed design heuristics that can guide mutation selection to optimize binding affinity, providing a framework that is both predictive and interpretable.