2024 AIChE Annual Meeting
(594g) Autonomous Protein Engineering through Robotics, Machine Learning, and Large Language Model
Authors
Singh, N., Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign
Lane, S. T., University of Illinois at Urbana-Champaign
Zhao, H., University of Illinois-Urbana
Protein, the hallmark of life, plays a critical role in modern industry such as biotechnology, food, pharmaceutical etc. Protein engineering aims to explore variants with improved properties useful for researchers. Propelled by advances in automation, machine learning (ML), and large language models (LLMs), protein engineering is entering a new era. Our study introduces a pioneering approach to protein engineering that leverages these technologies to create a highly efficient, robust, and adaptable system. By integrating automation robotics with ML and LLMs, we establish an autonomous protein engineering platform that significantly reduces the need for human intervention, judgement and domain expertise. Our system is started with predicting initial mutants using protein LLM1 and evolutionary information2. The initial variant library is constructed, expressed and measured using iBioFAB3, the Illinois biological foundry for advanced biomanufacturing. Subsequently, the data collected from the assay is used to train an ML model predicting variant fitness, which is used to guide mutation exploration in the next round (Figure 1). In parallel to robotics and ML, we also leverage ChatGPT-based co-pilot to help humans with data preparation, automation worklist generation, and many other tasks. In our study, we focused on halide methyltransferase (HMT), as a case study to demonstrate the efficacy of our system. Without prior knowledge, the protein LLM was able to identify the same single mutation reported in the previous HMT protein engineering study4. With a total of 4 rounds of autonomous engineering, we observed a progressive improvement of the activity of HMT (Figure 2). After the last round of engineering, we were able to identify a variant with more than 5-fold activity improvement over wildtype HMT. In conclusion, our system iterates through experimental feedback, effectively taking the human aspect almost entirely out of the loop. This novel methodology not only accelerates the protein engineering process but also enhances its adaptability across various proteins.
- Meier, Joshua, et al. "Language models enable zero-shot prediction of the effects of mutations on protein function." Advances in neural information processing systems 34 (2021): 29287-29303.
- Hopf, Thomas A., et al. "Mutation effects predicted from sequence co-variation." Nature biotechnology 35.2 (2017): 128-135.
- HamediRad, Mohammad, et al. "Towards a fully automated algorithm driven platform for biosystems design." Nature communications 10.1 (2019): 5150.
- Tang, Qingyun, et al. "Directed evolution of a halide methyltransferase enables biocatalytic synthesis of diverse SAM analogs." Angewandte Chemie International Edition 60.3 (2021): 1524-1527.