2024 AIChE Annual Meeting

(4jk) Computational Molecular Design and Informatics for Autonomous Molecular Discovery

Research Interests

The discovery of new functional molecules is a fundamental challenge in chemical science and engineering, critical for addressing societal challenges such as health, energy, and sustainability. Traditionally, the discovery processes often rely on extensive trial and error or a profound understanding of functioning mechanisms that require substantial domain knowledge. Recent advancements in deep learning technology offer a transformative potential for these processes. Deep neural networks excel at recognizing and processing complex patterns, allowing researchers to study intricate trends in historical chemical data that were previously difficult to analyze. Furthermore, the data-driven nature and powerful modeling capabilities of deep generative models enable a systematic approach to molecular design. The promise of deep learning extends from molecular design applications to fundamental research in chemical science, making autonomous molecular discovery increasingly feasible.

My research group will further advance methods in computer-aided molecular design and cheminformatics, focusing on algorithms based on deep learning technology. My long-term goal is to achieve autonomous molecular discovery, enabling the identification of new functional molecules that advance human health (such as new drug molecules) and environmental sustainability (such as new carbon sorbents). To realize this vision, the initial stages of my research program will be dedicated to addressing the following aims:

Aim 1: Develop new strategies to train chemical foundation models leveraging historical reaction data for more accurate property prediction.

Aim 2: Create a model-based molecular design framework that integrates uncertainty-aware models for efficient chemical space navigation.

Aim 3: Apply the developed methods to discover novel molecules that enhance health and environmental sustainability.

Prior Research

My expertise in computational chemistry, machine learning, and molecular design within the framework of modern chemical science makes me uniquely qualified to address the proposed research aims. During my doctoral research with Prof. Connor W. Coley at the Massachusetts Institute of Technology, I focused on developing advanced de novo molecular design algorithms to enhance the discovery of functional molecules, which is a key decision step in the autonomous molecular discovery pipelines. A predominant theme of my research contributions to date is the development of practical methods that address the key failure modes in existing algorithmic solutions that impede their adoption. Several notable examples from my doctoral research include: the development of synthesizability-constrained models for generating feasible chemical compounds [1-3]; the creation of sample-efficient optimization strategies to navigate chemical space more effectively [4-6]; and the implementation of model training strategies for improving molecular property predictions with limited data [7-8]. Besides that, I also put significant effort into data collection, curation, and model benchmarking, which led to Therapeutic Data Commons (TDC) catalyzing data-driven therapeutic discovery solutions [9-10]. Overall, my work has led to significant advancements in systematic molecular discovery methodologies, paving the way for more efficient and autonomous workflows in the field.

References

  1. Gao, W., & Coley, C. W. (2020). The synthesizability of molecules proposed by generative models. Journal of chemical information and modeling, 60(12), 5714-5723.
  2. Gao, W., Mercado, R., & Coley, C. W. (2021). Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design. In International Conference on Learning Representations.
  3. Luo, S., Gao, W., Wu, Z., Peng, J., Coley, C. W., & Ma, J. (2024). Projecting Molecules into Synthesizable Chemical Spaces. In International Conference on Machine Learning. PMLR.
  4. Fu, T., Gao, W., Xiao, C., Yasonik, J., Coley, C. W., & Sun, J. (2021). Differentiable Scaffolding Tree for Molecule Optimization. In International Conference on Learning Representations.
  5. Gao, W., Fu, T., Sun, J., & Coley, C. (2022). Sample efficiency matters: a benchmark for practical molecular optimization. Advances in neural information processing systems, 35, 21342-21357.
  6. Fu, T., Gao, W., Coley, C., & Sun, J. (2022). Reinforced genetic algorithm for structure-based drug design. Advances in Neural Information Processing Systems, 35, 12325-12338.
  7. Tynes, M., Gao, W., Burrill, D. J., Batista, E. R., Perez, D., Yang, P., & Lubbers, N. (2021). Pairwise difference regression: a machine learning meta-algorithm for improved prediction and uncertainty quantification in chemical search. Journal of Chemical Information and Modeling, 61(8), 3846-3857.
  8. Gao, W., Raghavan, P., Shprints, R., & Coley, C. W. (2024). Substrate Scope Contrastive Learning: Repurposing Human Bias to Learn Atomic Representations. arXiv preprint arXiv:2402.16882.
  9. Huang, K., Fu, T., Gao, W., Zhao, Y., Roohani, Y., Leskovec, J., ... & Zitnik, M. (2021). Therapeutics Data Commons: Machine Learning Datasets and Tasks for Drug Discovery and Development. Advances in neural information processing systems.
  10. Huang, K., Fu, T., Gao, W., Zhao, Y., Roohani, Y., Leskovec, J., ... & Zitnik, M. (2022). Artificial intelligence foundation for therapeutic science. Nature chemical biology, 18(10), 1033-1036.

Teaching Interests

I view teaching as a crucial aspect of academic development and sustaining a vibrant community. Recognizing its importance, I have earned a Graduate Teaching Certificate from the MIT Teaching+Learning Lab and actively participated in teaching various courses both within and outside the university, including the development and offering of a 6-unit for-credit course on Machine Learning for Molecular Design at MIT (6.S085). I am capable of teaching any core class in the Chemical Engineering curriculum, with a particular interest in numerical methods, and I can also teach adjacent fields in chemistry that align with my research, such as physical chemistry, as well as machine learning courses from introductory levels to advanced topics like geometric deep learning and large language models. Additionally, I am keen to develop new courses that integrate machine learning with chemical engineering and chemistry, addressing the growing need for expertise in this area in both research and industry.

Selected Honors and Awards

2024: ACS CINF The Scientific Excellence Award

2024: CAS Future Leaders Top-100

2024: D. E. Shaw Research Doctoral Fellowship

2023: Google PhD Fellowship

2022: Takeda Fellowship

2016: National University Student Innovation Program Grant awarded by the Ministry of Education, China

2016: May Fourth Scholarship awarded by Peking University, China