Breadcrumb
- Home
- Publications
- Proceedings
- 2021 Annual Meeting
- Meet the Candidates Poster Sessions
- Meet the Faculty and Post-Doc Candidates Poster Session
- (4hf) Machine Learning and Computational Tools for Molecular Properties and Reaction Systems
Machine learning (ML) is a powerful tool, being shown to be effective at solving otherwise intractible problems in many scientific domains. Recent work using machine learning in chemistry has yielded significant results, drawing complex features from representations of the molecular graph. I will be working to develop systems to improve ML treatment of chemical properties that will take advantage of the unique context of molecular features for the prediction of chemical properties. Specifically, I will be incorporating the technique of boosting (using a synthesis of multiple sequential ML models) into a broadly accessible machine learning software. In boosting, each model stage draws from the input features to find a way to reduce the residual errors remaining after the previous stage. In many contexts, the features available at each stage are the same, but this is not so with the molecular graph. Chemical systems are uniquely well situated for boosting because the incredible complexity of molecular graph features means that new information (or new combinations of data) can be utilized at each stage. Boosting in chemical systems presents the opportunity to build a model that incorporates all of the available feature information instead of relying on tools for curating the best subset of feature information. This new approach for chemical models will be applied to properties of great importance for which reliable models are still being developed (such as whole pH-range pKa) and to properties for which models have already been trained but where improvements may still be attainable.
I will also develop systems and tools for the continued advancement of uncertainty quantification in ML applications of chemical systems. ML models often suffer from a lack of context and interpretability, producing black box predictions for a user. Even with whole-model descriptions of performance and error, the level of uncertainty for individual predictions made by a model may vary wildly without any indication to the user. Some methods of uncertainty quantification already exist and are all capable of being calibrated to perform appropriately on average over a whole dataset. However, this overall behavior masks shortcomings in resolution when considering subsets of the data. I will develop and apply metrics for scoring uncertainty quantification methods in order to guide improvements to these calculations. Additionally, I will develop techniques to separate and quantify different sources of error (noise, model bias, model variance) as part of uncertainty quantification, a source of inconsistency not addressed in existing techniques. In tandem with making developments to ML structures for chemistry models and uncertainty prediction, I will incorporate these advances into user-accessible software. Creating a system that can be used productively even by non-experts in ML will help scientists in a wide range of fields benefit from the power of these tools.
In addition to ML tools, I will also study ways to improve the treatment of complex chemical mechanisms. Many of the most interesting reaction systems comprise very large sets of reaction systems. Tools such as Reaction Mechanism Generator (RMG) and similar allow for procedural generation and study of these systems. Large mechanisms frequently contain subsets of reactions which are semi-equilibrated, leading to inefficient calculations necessary to solve these stiff systems. Existing tools have been developed to de-stiffen these systems through quasi-steady-state assumptions and species lumping. Such lumping tools are incredibly important for enabling large-system calculations for any level of available computational resources. However, these techniques are often non-invertable and difficult to interpret. I will be using time-constant analysis to perform an analogous lumping function but with greater interpretability by the user, dynamic time-constant setting, and complete invertability. The addition of dynamic time-constant adjustments will allow for the lumping structure to change appropriately for different temperatures and reactant concentrations when considering different reaction conditions and, in some cases, shifting over the course of a single simulation.
I will also study individual chemical mechanisms in addition to the handling systems generally, with a focus on organometallic systems. The world of chemical manufacturing is filled with chemistries that are widely used, well studied, and yet not understood to the level of a detailed elementary mechanism. These reactions may already be used with great efficiency but their full potential cannot be assessed with a more full understanding of the elementary reactions at play. With this sort of improved understanding, it allows a practitioner to better tune reagents, and conditions. A full understanding of the mechanism could open up new pathways in analogous systems or provide solutions for unwanted byproducts. Recent advances in quantum chemistry calculations have opened the way to study these long-used mechanisms. Techniques like multistructure transition state theory will allow for the treatment of unusual vibrational modes that are difficult to approximate as simple hindered rotors. Approximations of solution effects can now be performed with more resolution using QM/MM techniques to represent solvent molecules directly in place of polarizable field models. Barrierless reactions can be represented with variable reaction coordinate transition state theory. The combination of advances in these three areas make elementary study of organo-metallic mechanisms possible at a higher quality than ever before.
Teaching Interests
My teaching philosophy focuses instruction on student achievement of the courseâs and programâs ultimate goals. This entails considering carefully how learning objectives along the way will lead to the ultimate goals and capabilities and designing assessments and lectures around them. Framing the components of class in a way that the students know what goals they are leading to helps align the instructorâs goals for the class with the students. The approach also involves recognizing where the students are in terms of content and skills from other courses so that the appropriate emphasis can be placed on the learning objectives they need to advance. I intend to apply this approach to both undergraduate and graduate courses.