Metal-Organic Frameworks (MOFs) are a class of porous crystalline materials made from coordinated units of metal clusters and organic linkers. There is a near limitless number of building blocks that can be used to design MOFs with a wide variety of structures and properties. By logically selecting which of these molecular blocks to use, additional properties, like pore size, can be tuned for individual gases. With an immeasurable number of materials to choose from, there is a near limitless number of MOFs that can be fabricated and it is likely there are high-performing MOFs that have yet to be realized. An additional benefit of MOFs is the reticular nature of these materials allows for crystal structures to be generated and evaluated in silico. Comprehensive molecular simulations have been successfully used to accurately predict gas adsorption in MOFs, but the high computational cost is prohibitive to high throughput screening. In the work, we replace the simulations with random forest (RF) machine learning. Some common properties of MOFs include high porosity and surface area as well as tunable pore size and topology. These properties are desirable for many applications including gas adsorption. While MOFs have shown good promise experimentally in gas storage applications, more research is needed in this area to assess the performance of the numerous undiscovered MOFs that have yet to be realized.
The goal of this work was to develop a combined genetic algorithm and random forest model to discover novel high performing metal-organic frameworks for gas adsorption. Due to the sheer number of MOF building blocks available, there are a near limitless number of unique MOFs that have been yet to be realized. Given the promising performance of known MOFs, it is highly probably that several of these unrealized MOFs are high-performing methane adsorbers. To accomplish this, we have developed the GARF model, which combines a Genetic Algorithm (GA) with Random Forest (RF) machine learning. By strategically selecting the features used to predict methane adsorption, we can develop an evolutionary algorithm to accurately and efficiently discover high-performing MOFs for methane adsorption.
Using minimal input information, we developed an integrated genetic algorithm random forest machine learning (GARF) model to design and screen high-performing MOFs for gas adsorption. By using a combination of structural, chemical, and crystal descriptors, we were able to predict methane adsorption rapidly and accurately. We trained the RF models on 80% of 50,000 hypothetical MOFs (hMOFs) from the MOFXDB database and tested the models on the remaining 20%. We achieved an R2 value of 0.92 and mean absolute error percentage (MAPE) of 10.2%. In order to intelligently screen hundreds of thousands of MOFs, we implemented a genetic algorithm (GA) which uses the principles of recombination and mutation to evolve solutions to problems. The input information to the GA is encoded in a data structure known as a chromosome. Each chromosome represents a hypothetical MOF and contains only four building blocks and two pieces of crystal information. From this, a chemical formula can be generated from which many chemical properties can be calculated. In addition to adsorption, we also replaced the molecular simulations needed to calculate structural properties with RF machine learning and also achieved high R2 and relatively low MAPE values for each of the six structural properties predicted. By using RF machine learning as the fitness evaluator for the GA, the GARF model is able to screen 250,000 hypothetical MOFs in mere minutes on a personal computer. Even while excluding the top 50 highest performing MOFs in the database from our training set, the GARF model evolved a high-performing MOF equivalent to the eighteenth best MOF out of the 50,000 hMOFs. This finding validates the use of the GARF model and will allow us to expand it to predict adsorption of other gases and other intrinsic properties of MOFs or other reticular materials rapidly and effectively.
We then used the information obtained about the building block chemistries of high-performing MOFs to discover novel materials. We built upon knowledge gained from the results of the previous iterations of the model and literature to identify promising candidate building blocks to add. We added a new metal cluster, organic linker, and functional group to the list of building blocks used as input into the model. GARF was then run and was able to screen for novel high-performing MOFs. The adsorption of high-performing materials was further validated using Grand Canonical Monte Carlo (GCMC) simulations. The GARF model is a valuable tool for accelerating materials discovery.