2019 AIChE Annual Meeting
(411h) Using Molecular Subgraph Libraries for High-Throughput Screening and the Inverse Design Problem
Author
First, we discuss the ârepresentabilityâ of large datasets using subgraphs. Specifically, we parse the PubChem database3 (~100 million compounds) into all possible subgraphs and then calculate the number of subgraphs required to fully represent 100%, 90%, 80%, etc. of the database using subgraphs of different (and mixed) heights. Next, we perform DFT calculations with ADF4 on the most common 100,000 subgraphs from PubChem and use this library of molecular fragments to generate sigma-profiles for use in high-throughput screening with COSMO-RS5 as well as in geometry initialization for accelerating DFT calculations. Finally, we discuss the âinverse designâ problem: the problem of finding an optimal molecular structure(s) given molecular structure/property constraints and a design objective. We address the inverse design problem using state-of-the-art Mixed-Integer (Non)Linear Programming (MILP/MINLP) techniques6, exploiting problem structures inherent in subgraph representations of molecules. Specific applications to solvent, drug, and electronic materials design are discussed.
[1] Faulon, Jean-Loup, Donald P. Visco, and Ramdas S. Pophale. "The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies." Journal of chemical information and computer sciences 43.3 (2003): 707-720.
[2] Rogers, David, and Mathew Hahn. "Extended-connectivity fingerprints." Journal of chemical information and modeling 50.5 (2010): 742-754.
[3] PubChem Database. National Institute of Health. https://pubchem.ncbi.nlm.nih.gov/
[4] ADF2018, SCM, Theoretical Chemistry, Vrije Universiteit, Amsterdam, The Netherlands, http://www.scm.com.
[5] Klamt, Andreas, Volker Jonas, Thorsten Bürger, and John CW Lohrenz. "Refinement and parametrization of COSMO-RS." The Journal of Physical Chemistry A 102.26 (1998): 5074-5085.
[6] Austin, Nick D., Nikolaos V. Sahinidis, and Daniel W. Trahan. "Computer-aided molecular design: An introduction and review of tools, applications, and solution techniques." Chemical Engineering Research and Design 116 (2016): 2-26.