2020 Virtual AIChE Annual Meeting
(106e) The Open Catalyst Project Dataset
Authors
Zachary Ulissi - Presenter, Carnegie Mellon University
  Junwoong Yoon, Carnegie Mellon University
  Kevin Tran, Carnegie Mellon University
  
  Aini Palizhati, Carnegie Mellon University
  Javier Heras-Domingo, Carnegie Mellon University
  Abhishek Das, Georgia Tech
  Devi Parikh, Georgia Tech and Facebook AI Research
  Lowik Chanussot, Facebook AI Research
  Siddharth Goyal, Facebook AI Research
  Caleb Ho, Facebook AI Research
  Thibaut Lavril, Facebook AI Research
  Morgane Riviere, Facebook AI Research
  C. Lawrence Zitnick, Facebook AI Research
      The Open Catalyst Project aims to develop new ML methods and models to accelerate the catalyst simulation process for renewable energy technologies and improve our ability to predict activity/selectivity across catalyst composition. To achieve that in the short term we need participation from the ML community in solving key challenges in catalysis. One path to interaction is the development of grand challenge datasets that are representative of common challenges in catalysis, large enough to excite the ML community, and large enough to take advantage of and encourage advances in deep learning models. Similar datasets have had a large impact in small molecule drug discovery, organic photovoltaics, and inorganic crystal structure prediction. We present the first open dataset from this effort on thermochemical intermediates across stable multi-metallic and p-block doped surfaces. This dataset includes full-accuracy DFT calculations across 53 elements and their binary/ternary materials, various low-index facets. Adsorbates span 56 common reaction intermediates with relevance to carbon, oxygen, and nitrogen thermal and electrochemical reactions. Off-equilibrium structures are also generated and included to aid in machine learning force field design and fitting. Collectively, this dataset represents the largest systematic dataset that bridges organic and inorganic chemistry and will enable a new generation of catalyst structure/property relationships. Fixed train/test splits that represent common chemical challenges and an open challenge website will be discussed to encourage competition and buy-in from the ML community.