2024 AIChE Annual Meeting

(4hd) Accurate Thermochemistry & Kinetics of Ionic Solutes with Computational Chemistry

Author

Jonathan Zheng - Presenter, Massachusetts Institute of Technology
Research Interests

My research experience and interests are in developing fast, accurate models for phenomena related to ions. I leverage machine learning, quantum-chemical calculations, and cheminformatics to develop high-quality datasets that are used to then generate chemical insights and predictive models. My experience in data science is “end-to-end”, in that I have expertise in data curation & cheminformatics, software development, and model creation. I have experience with quantum chemistry tools including Gaussian, COSMOtherm, TURBOMOLE, Q-Chem, VASP, and xTB / CREST. I work regularly with cheminformatics, having contributed to packages py2opsin and RDMC. I am also a contributor to the Reaction Mechanism Generator kinetics software, Arkane (transition-state theory), and Chemprop (chemical deep learning).

In my poster, I will highlight major themes in my research under Prof. William H. Green at MIT:

(1) Solvation free energies

Despite the fundamental importance of solvation free energies of charged solutes in chemical modeling, predictive models are poor. One main reason is that their training datasets are small, with the largest containing just a few hundred data points in total (of which 112 are in water). In one project, I compiled a dataset of 273 experiment-derived solvation energies of ions in water. From the increased availability of data, structure-activity patterns became apparent, which I leveraged to develop corrections that reduced model error by approximately 66% (from 4.9 to 1.7 kcal/mol).

I am expanding on this work by including quantum-chemical calculations for other solvents, and leveraging the synthetic data to develop a predictive machine learning model. Other ongoing work includes developing large datasets and predictive models for radical and zwitterionic solutes.

(2) pKa

Working with the International Union of Pure and Applied Chemistry (IUPAC), I have curated a large aqueous pKa dataset that is the first to follow FAIR data principles, a dataset that is now incorporated into PubChem. I am also a contributor to an official IUPAC project related to compiling, curating, and correcting non-aqueous pKa data.

A separate project I worked on demonstrates that quantum chemistry (specifically COSMO-RS calculations), combined with data for aqueous pKa, can be used to accurately compute dissociation constants for pKa in other solvents.

Inconsistencies in cheminformatics have led to widespread systematic errors in data, which further lead to mistakes in predictive models, especially for amino acids and drug molecules. One theme of my work is in clarifying these mistakes, encouraging extra effort in cleaning and curating data for machine learning models.

Ongoing research is in leveraging the compiled data to develop predictive models for pKa in water and other solvents.

(3) Rate coefficients for ionic reactions

The effect of solvent on reaction rate for reactions of charged molecules is not well-understood. In this work, we compiled a set of experimental rate coefficients for 50 reactions, which we then benchmarked using quantum chemistry and transition-state theory. The mean absolute error of our calculations is less than 1 log unit (with higher errors for certain classes of reactions than others), suggesting that solvation models such as COSMO-RS can adequately describe solvent effects. We further showed that the accuracy of these computations is not sensitive to the underlying geometry optimization method, and that even GFN-xTB2 level geometries led to average errors within 1 magnitude.

Current work is in using this method to develop a synthetic dataset, which will be used to train a machine learning model that predicts solvent effects.

(4) Software Development

I am a contributor to several popular open-source chemistry packages: Chemprop, Reaction Mechanism Generator, Reaction Data and Molecular Conformers (RDMC), and py2opsin.

Overall mission

I am interested in working with a research group that leverages my experience in a broad range of computational machine learning - including data curation / cheminformatics, software development, chemical kinetics, and model creation - to discover chemical insights that improve the lives of people. I would be comfortable working and learning in a scientific field that is new to me, and look forward to collaborating on challenging research problems.