Minimum ignition energy, the minimum amount of energy required to ignite a combustible dust at a given temperature, is a critical safety parameter in the design of processes involving combustible dusts. While accurate MIE measurements are essential, MIE testing may not be feasible at all stages of drug development, especially when APIs and intermediates are in short supply. An accurate classification model that can distinguish between compounds with low vs. high MIE can help save resources by eliminating the need to measure MIE for molecules classified as low risk.
In this work, machine learning classification models were built using 3D descriptors derived from density functional theory (DFT) calculations, as well as molecular descriptors calculated with RDKit. The models were trained to distinguish between low MIE (<10 mJ) and high MIE (>10 mJ) compounds using data from an internal dataset as well as internal data complemented with literature data and their performances were compared. Visualization of the data using unsupervised learning techniques like principal component analysis showed that the set of descriptors can distinguish between high and low MIE compounds to a good extent. Descriptors like HOMO/LUMO levels, molLogP (octanol/water partition coefficient), and carbon charges were found to be especially important markers of MIE. While up to 90% classification accuracy was reached with the 3D+RDKit descriptors on combined internal and literature data, model performance was found to depend heavily on the distribution of compounds in the test and training datasets. In short, machine learning models showed promise as a quick way to distinguish between high risk and low risk pharmaceutical dusts based on their minimum ignition energy.