2019 AIChE Annual Meeting
(560gv) Rational Catalyst Design: Kinetics Put into Action for Small Open Data
Authors
Figure 1: Data-based
rational catalyst design cycle. Focus of this work: green.
Open
Access catalytic data
As a
starting point, the current storage of data on catalysis was evaluated. Open
Access storage of data has been mainly driven by recent policies from funding
bodies and publishers. Hence, to investigate the ongoing potential, the focus
should be on a reaction widely investigated in the past few years.
Hydrodeoxygenation has been receiving significant attention in the last years,
but catalyst design remains an open challenge 3. By employing a broad search for whole reaction families
under hydrodeoxygenation, well-known repositories and search engines were
surveyed (Table 1). The overall number of available datasets is modest as
compared to the number of publications in the same field (ca. 2600, according
to WoS at the same moment). Particularly, if one excludes Figshare (mainly a
repository of articles preprints). In the other cases, not all search results were
relevant (e.g. data generated by molecular modelling calculations), and data were
often unstructured and flawed.
Table 1: Number of Open
Access datasets on hydrodeoxygenation (October 2018).
In
summary, one cannot presently rely on Open Access data, but more data sharing, better
curation and standardized formats, following the FAIR Data Principles 4,will turn it into reality
in the near future. Furthermore,
literature mining software, already employed in biology, can enable the access
to virtually the whole data generated till present. In that sense, the
characteristics of (upcoming) Open Access data will result from the combination
of individual datasets, as we know them today. For a
given reaction, these will result in a significant amount of data, but very far
in volume and balance from the massive volume of big data. Conversely, the existing
data visualization and inference tools, designated under the umbrella term
machine learning, have been designed for truly big data, i.e. huge amounts of
well-balanced data11. Hence, tools adapted to the considerable
smaller size of catalytic data must be developed.
On the
way to automate kinetic information extraction
As of
today, the first step of information extraction (Figure 1) relies solely on the researchers prior knowledge and
experience. The methodology under development aims at filling in this gap. In
practice, this means that all kinetic features (e.g. variations in conversion)
must be identified and, preferably, classified in terms of relevance. This can
be achieved via the recognition of patterns and fingerprints 5, e.g.
abrupt variations in the performance indicators. The first step is to visualize
the general trends in the dataset. Typically, a researcher draws a curve based
on his/her intuition, which represents the overall trend in the data,
acknowledging experimental error. The tool developed herein is, hence, meant to
mimic how the researcher would draw the overall trend in a dataset.
The
developed algorithm is based on the class UnivariateSpline in SciPy module
of Python 6. The latter consists of an iterative procedure in which
the number of piecewise polynomials (which altogether constitute a spline) is
increased until the residual sum of squares is below the defined tolerance
level. The results with fictive data (Figure 2.a and .b) indicated that the algorithm
could not adequately reproduce simple shapes. By testing data featuring different
trends, it also became clear that the optimal tolerance level depends on the dataset.
A new
algorithm featuring lower tolerance was thus developed. To prevent overfitting,
the tolerance level is decreased until a maximal number of piecewise
polynomials is reached. In some cases, particularly for small datasets, the
piecewise polynomials can still be superfluous. Therefore, the spline is
replaced by a lower-degree polynomial if the generated trends are not
physically realistic (e.g. excessive variability) or if that polynomial yields
a goodness-of-fit sufficiently similar to the higher-degree one. In addition, a feature classification
function in terms of shape and variability was also introduced. The results are
shown for a fictive dataset with an S-shape (Figure 2.c) and two real anisole hydrodeoxygenation datasets by Otyuskaya et al. 7
(Figure 2.e and .f). For the conversion, the trend could be described by two
shapes, while the selectivity could be described by a linear increase. The developed algorithm is hence able
to follow the intuitive trends in data for variable number of points, shape,
and variability.
Figure 2: Performance
of state-of-the-art and developed algorithms.
From descriptors to structure: a case study
In
order to establish relationships between kinetically-relevant catalyst features
and parameters which can be tuned during the synthesis procedure, data diverse in catalyst
performance is required. Fortunately, in the field of Oxidative Coupling of
Methane, studies involving more than a few catalysts have been carried out
paving the way for its utilization in catalyst design even before the advent of
Open Access data. The most prominent case comprises forty-four catalysts tested
at similar operating conditions8, i.e. small catalytic data.
To
match experimentally observed performances with simulated ones and potentially
draw significant relationships, microkinetic simulations were carefully combined
with statistical tools. The microkinetic simulations were carried out using a
state-of-the-art model 9
and varying
the catalyst descriptors by the means of Design of Experiments. By combining
the simulated catalyst with ones from the referred dataset, this resulted in
the identification of four clusters of catalysts holding distinct performances
(Figure 3). Interestingly enough,
a cluster (the blue) with optimal performing catalyst could be identified, but
no tested catalysts were included. More importantly, by comparing the descriptors
of different clusters, relationships between the composition and properties of
the catalysts tested by Kondratenko et al. 8 and the simulated catalyst descriptors are being
drawn.
Figure
3. Comparison of experimental8 (closed) and simulated (open symbols) data
for OCM catalysts at iso-operating conditions. The color code distinguishes
clusters of catalyst with comparable performance.
Summary
The lack of curated and standardized Open Access data on
catalysis precludes its use at present, but the ongoing policies will overcome
this obstacle. This will generate small catalytic data which cannot for which
machine learning techniques are not adapted. To efficiently make use of such
data, a tool for automated kinetic information extraction is under development
which can, as of today, recognize the relevant patterns in small data,
mimicking the intuition of a researcher. Finally, a methodology able to extract
knowledge for catalyst design for typical catalytic data has been also
developed.
Acknowledgments
Funding was
granted by Ghent University BOF (BOF18/PDO/093)
and EU commission (ERC Grant No. 615456).
References
1. Van der Borght, K. et al.,
Catalysts 2015, 5, 1948.
2. Chiang, L. et al., Annu.
Rev. Chem. Biomol. Eng. 2017, 8, 63-85.
3. Chen, S. et al., Renewable
and Sustainable Energy Rev. 2019, 101, 568-589.
4. Wilkinson, M. D. et al. ,
Scientific Data 2016, 3, 160018.
5. Caruthers, J. M. et al., J.
Catal. 2003, 216, 98-109.
6. Bellussi, G. et al., Catal.
Sci. Technol.2013, 3, 833-857.
7. Otyuskaya, D. et al., Energy
Fuels 2017, 31, 7082-7092.
8. Kondratenko, E. V. et al.,
Catal. Sci. Technol. 2015, 5, 1668-1677.
9. Pirro, L. et al., Ind.
Eng. Chem. Res. 2018, 57, 16295-16307.