2024 AIChE Annual Meeting

(169c) Prediction of pKa in Different Solvents Via Deep Learning

Authors

Jonathan Zheng - Presenter, Massachusetts Institute of Technology
Ivari Kaljurand, University of Tartu
Sofja Tshepelevitsh, University of Tartu
Ivo Leito, University of Tartu
Thomas Nevolianis, RWTH Aachen
Simon Müller, Hamburg University of Technology
William Green, Massachusetts Institute of Technology
The acid dissociation constant, or pKa, is important in many applications including drug discovery, chemical synthesis, and environmental studies. Most publicly available models and data compilations are developed for aqueous pKa predictions; resources for nonaqueous solvents are comparatively much rarer. Yet, pKa values in different solvents can differ by orders of magnitudes.

The difference in pKa of a compound between two solvents can be calculated using solvation models. These calculations can be used with aqueous pKa data, as well as an “anchor” acidity value in the desired solvent, to compute pKa in nonaqueous solvents. Previously, we have shown that this method can be used with the COSMO-RS solvation model to compute dissociation constants with mean absolute errors (MAEs) less than 1 log unit in several solvents, with good performance for solutes including small molecules such as amino acids, and neurotransmitter derivative molecules.

In this work, we leverage this method to develop a dataset of computed pKa values. We introduce previously-unpublished experimental data, including datasets that have recently been, or are in the process of being, critically reviewed by IUPAC. We combine these synthetic and experimental datasets to develop a large corpus of training data, which is used to train a directed message-passing neural network (D-MPNN) model. Finally, we evaluate the model performance, comparing against the performance of other nonaqueous pKa models.