2024 AIChE Annual Meeting

(4bd) Machine Learning Innovations in Biocatalysis and Protein Engineering

Research Interests

Functional molecules are essential for addressing numerous societal issues, including energy, sustainability, and health. However, synthesizing large and complex molecules with multiple stereocenters remains a significant challenge. To overcome this, biological catalysis using enzymes has been extensively utilized. Enzymes, nature’s catalysts, offer several advantages over organic chemistry catalysts, such as higher catalytic activity and selectivity, and the ability to operate under mild conditions. Therefore, enzymes are often the preferred choice for chemo-, regio-, or stereoselective reactions, enabling more sustainable production processes at both laboratory and industrial scales. While many studies have successfully leveraged enzymes for large-scale organic synthesis, there are still limitations to their routine use in synthetic reactions. For example, enzyme functions are not yet fully discovered, and wild-type enzymes may not always meet researchers’ needs. Consequently, my doctoral research has focused on developing machine learning models to solve problems related to enzymes. At the University of Illinois at Urbana-Champaign, under the guidance of Prof. Huimin Zhao, I have dedicated my efforts to designing machine learning models for predicting enzyme function, facilitating enzyme engineering, and predicting the bioactivity of molecules.

As my first cornerstone project, I developed an ML model named CLEAN (Contrastive Learning Enabled Enzyme Function Annotation) to predict enzyme functions from amino acid sequences. Using contrastive learning, I trained a model with a protein language model to embed raw amino acid sequences. CLEAN has outperformed all earlier state-of-the-art models in predicting enzyme EC numbers in silico. Additionally, we used halogenase as a case study to validate CLEAN's performance in vitro. Moreover, I have been working on developing an AI-driven autonomous protein engineering platform and an ML model to predict chemical compounds' antibiotic activities against gram-negative bacteria. Manuscripts describing these two ongoing projects are under preparation. As an aspiring independent principal investigator, my future research will build upon my doctoral work, focusing on developing computational tools in synthetic biology. I aim to expand the capability and application of machine learning and protein language models.

Teaching Interests

Witnessing my students and mentees grow and succeed has been immensely rewarding throughout my Ph.D. training. During my undergraduate and graduate studies, I served as a teaching assistant for a total of eight semesters, teaching core chemical engineering courses, design courses, and electives. These experiences have given me the confidence to lead both core and elective chemical engineering classes. Additionally, my research has well-positioned me to teach computational methods courses, including computational analysis, data science, and machine learning for chemical engineering. I am also passionate about designing new curricula that explore the application of computational tools in chemical engineering and experimenting with how students can benefit from large language models.