2021 Annual Meeting

(291a) Naming, Classifying, and Comparing Polymers in the Era of Data Science

Checkout Do you already own this? Log in to access this content.

Pricing

Individuals

AIChE Pro Members	150.00
AIChE Emeritus Members	105.00
AIChE Graduate Student Members	Free
AIChE Undergraduate Student Members	Free
AIChE Explorer Members	225.00
Non-Members	225.00

Author

Bradley Olsen - Presenter, Massachusetts Institute of Technology

Describing the chemistry of polymer materials is an extremely difficult challenge due to the fact that a polymer is actually an ensemble of molecules assembled by stochastic reactions, making it difficult to fit neatly into frameworks that have largely developed around the concept of molecules with deterministic structure. These challenges have been exacerbated by the advent of data sciences, necessitating schemes for naming and processing polymer structures that are interoperable between humans and machines to fully take advantage of developments in new algorithms, data models, and analysis tools. Many of these challenges have been addressed by line notations and associated chemoinformatics tools in the small molecule literature, and extensions to polymers promise a similarly large impact on our capabilities.

Recently, we developed BigSMILES, a stochastic line notation capable of capturing polymer structures in a way directly analogous to chemical structure drawings but offering all the advantages of and full compatibility with the SMILES small molecule line notation. However, BigSMILES, like chemical structure drawings, only defines the set of possible molecules. To define their probabilities, characterization data is necessary. To address this, we have put forward the PolyDAT schema that links characterization to line notation, providing complete chemical definition of a polymer. Together, these structures enable many exciting challenges to be addressed. First, we demonstrate how polymer structures can be canonicalized, both using empirical rules and through analogy to automata in computer science. Second, we show how BigSMILES can be used to drive polymer vectorization, and third, we show how BigSMILES can form the basis of polymer similarity comparisons.

Extending the initial BigSMILES grammar, we have also developed BigSMARTS, an extension of SMARTS that allows search of polymer structures. We have further demonstrated that BigSMILES is compatible with the concepts put forth in SELFIES, enabling polymers to be written in a way that makes them more amenable to use in genetic algorithms. Finally, the stochastic nature of BigSMILES makes it inherently compatible with non-covalent bonds, an advantage over deterministic line notations. We use this feature to extend BigSMILES to a wide variety of molecular constructs useful in colloidal and supramolecular materials.

Breadcrumb

2021 Annual Meeting

(291a) Naming, Classifying, and Comparing Polymers in the Era of Data Science

Author