2025 AIChE Annual Meeting

(156f) Automated Capture of Crystallization Outcomes Via Deep Learning Image Analysis in the CMAC Crystallization Screening Datafactory

Authors

Parandeep Sandhu - Presenter, University of Strathclyde
Christopher Boyle, CMAC/Univeristy of Strathclyde
Christos Tachtatzis, University of Strathclyde
Javier Cardona, University of Strathclyde
Crystallization is a fundamental process in pharmaceutical manufacturing, with the solid-state properties of a drug—such as crystal size, shape and form—directly impacting downstream processes like filtration, drying, and formulation [1]. To address these challenges, high-throughput crystallization screening platforms have become indispensable tools in modern pharmaceutical development. These systems enable researchers to rapidly explore a wide range of crystallization conditions, accelerating the identification of optimal process parameters while reducing material usage and development timelines [2]. The CMAC Crystallization Screening DataFactory (CSDF) is a state-of-the-art, automated platform for small-scale crystallization, designed to accelerate process development through high-throughput experimentation and real-time analysis (see Figure 1). By combining automation with AI-driven insights and adaptive experiment planning, the CSDF efficiently screens crystallization conditions while monitoring critical attributes such as particle size, morphology, and key process parameters influencing solubility and kinetics. It also enables early detection of issues like fouling—where solid deposits accumulate on equipment surfaces—helping to maintain heat transfer efficiency, ensure smooth flow, and safeguard product quality. This data-driven workflow provides essential insights for process scale-up, ensuring laboratory findings translate into robust, scalable manufacturing strategies. By automating the analysis of large volumes of crystallization data, the CSDF supports informed decision-making and improves consistency, efficiency, and product quality at an industrial scale. The platform operates through four key stages: 1) Dosing, where drug and solvent are precisely dispensed;
2) Measurement, which uses the Technobis Crystalline system to carry out and monitor crystallization in real time, with additional analytical probes, such as Raman spectroscopy and X-ray diffraction (XRD), used to provide further experiment insights; 3) Data Analysis, where artificial intelligence (AI) extracts parameters such as solubility, nucleation and growth kinetics, and crystal habit; and 4) Experiment Planning, which uses Bayesian optimisation to refine conditions based on real-time data.

In this study, we present data analysis methods employed within the CSDF. The Technobis Crystalline v2 apparatus is equipped with a high-resolution in-situ camera for real-time imaging, a light detector that measures solution transmissivity, and a temperature sensor for tracking heating and cooling profiles. Based on the imaging sensor, we explore various computer vision techniques aimed at detecting and characterising crystallization phenomena, including the development of a Convolutional Neural Network (CNN) trained to categorise ten distinct crystallization outcomes, as illustrated in Figure 2. These include crystal habits such as block, needle, plate, and elongated shapes, as well as the detection of images where an object is present but lacks a clearly defined shape. The model also identifies unwanted outcomes that may arise during crystallization, such as agglomerated crystals, air bubbles, liquid droplets, overly concentrated solutions (resulting in blacked-out images due to rapid nucleation and growth), and unidentified floating objects (e.g., dust particles, hair strands, or other foreign artefacts).

This multi-label classification approach enables the detection of multiple phenomena within a single image, providing a comprehensive view of complex crystallization behaviours. Importantly, this method allows for rapid, automated analysis of experiments—delivering immediate feedback on crystal habit and process conditions to support real-time decision-making during high-throughput screening. To optimise model performance, individual threshold values for each label were fine-tuned using cross-validation, where over 150,000 annotated images were trained and evaluated, targeting the highest possible micro-averaged F1 score. This threshold calibration improved the model’s ability to distinguish between subtle differences in crystallization outcomes, especially in cases of overlapping features. The final model was evaluated on a separate test set of 25,000 unseen images, carefully curated to ensure a balanced distribution across all labels. On this dataset, the model achieved a mean micro F1 score of 96.2%, demonstrating its robustness and generalisability across diverse crystallization scenarios. Figure 3 shows the classifier’s predictions across a crystallization experiment, demonstrating its ability to detect multiple phenomena within each frame.


While the classifier effectively identifies the presence and habit of crystals, it does not provide information on individual particle size. To address this, we integrated an image segmentation model using Ultralytics YOLOv8 to enable particle-level analysis [3]. YOLOv8 was selected for its balance between model size, inference speed, and accuracy. Initially pre-trained on the Common Objects in Context (COCO) dataset, which contains approximately 200,000 annotated images [4], the model was fine-tuned using a dataset of 217 crystallization images annotated for individual crystal particles. After training, the model achieved a mean Average Precision (mAP) of 56.7%, demonstrating its ability to detect and localise individual crystals. This enabled the extraction of particle size distributions (PSDs) directly from the segmented instances—providing critical sizing information that cannot be captured by classification alone.

To automate the determination of crystallization thermodynamic data, specifically the clear point (where all particles are dissolved) and the cloud point (when nucleation occurs), we extracted a variety of features from in-situ imaging data. Namely, we used the pixel intensity mean, span (max-min) and variance, together with the perceptual and structural similarity index (SSIM) relative to the preceding image, capturing changes in brightness, contrast, and texture. We also included HELM5, a region-based metric that relates the distribution and extent of bright regions across the image, providing global brightness changes normalised across the image and sensitivity to gradual transformations in overall nucleation or dissolution. Alongside image features, we incorporated transmissivity as an additional input to track physical measurements from the crystallization process. Finally, classifier model outputs
were used to estimate the likelihood of key crystallization outcomes, such as particles, needle-like crystals, elongated crystals, platelike crystals, and agglomerated crystals. This broad set of features was used to train various machine learning models to detect the onset of clear and cloud points. The automated detection of these events was evaluated
against manual annotations for validation. A similar approach could be used to extract crystallisation kinetics from induction time measurements. We also applied the extracted features to detect fouling, a phenomenon where particles become stuck on the imaging window. This can interfere with the accurate determination of clear and cloud points, as the solution concentrations may no longer be representative and the presence of a persistent particle throughout both the heating and cooling profiles may be misinterpreted as ongoing crystallization.


This paper explores the integration of AI and PAT data analysis within the CMAC CSDF, demonstrating how deep learning models can automate crystallization monitoring with high accuracy. While the system showcases promising capabilities in accelerating process development and enhancing data quality, challenges remain—particularly in
generalising models across diverse compounds, solvents, and experimental setups. However, the continued growth ofthe CSDF and the inclusion of increasingly diverse datasets—spanning different Active Pharmaceutical Ingredients (APIs), solvent systems, and probe configurations—are driving improvements in model robustness and generalisability. These expanding datasets not only enhance the performance of machine learning and deep learning algorithms but also reflect the ongoing advancement of the automation platform itself. We discuss these developments and highlight opportunities to further improve data annotation pipelines and integrate multi-modal PAT data for a more comprehensive and scalable approach to crystallization process understanding.

References
[1] Conrad Meyer, Arjun Arora, and Stephan Scholl. A method for the rapid creation of AI driven crystallization process controllers. Computers & Chemical Engineering, 186:108680, 2024.
[2] Parisa Shiri, Veronica Lai, Tara Zepel, Daniel Griffin, Jonathan Reifman, Sean Clark, Shad Grunert, Lars P.E. Yunker, Sebastian Steiner, Henry Situ, Fan Yang, Paloma L. Prieto, and Jason E. Hein. Automated solubility screening platform using computer vision. iScience, 24(3):102176, 2021.
[3] Glenn Jocher, Ayush Chaurasia, and Jing Qiu. Ultralytics yolov8, 2023.
[4] Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C Lawrence Zitnick, and Piotr Doll´ar. Microsoft COCO: Common Objects in Context. arXiv, 2014.