The market for protein- and peptide-based therapeutics is projected to grow to 566 billion USD by 2030, and their manufacturing bottleneck has shifted to the downstream.1,2 Crystallisation, as a protein and peptide purification method, offers high productivity, cost-effectiveness, and a highly pure and solid product which can be easily stored and dosed to the patient.3 It however faces slow adoption due to limited understanding of crystallisation phenomena, which has led to poor control over the process and key performance variables.3
The usefulness of mechanistic computational models for improved understanding, process design and control of crystallisation is well established. Their formulation and parametrisation can however introduce challenges with respect to the choice of process kinetics, simplifying assumptions used, and the model structure itself. These will affect the degree to which a mechanistic model correctly emulates experimental observations.4–6 In that respect, data-driven models have been proposed as suitable alternatives that can utilise measured data to directly map to output predictions via statistical frameworks 7. Machine learning (ML)-based approaches using gaussian process regressors, neural networks, time-series transformers, and Neural Ordinary Differential Equations (NODEs) have all been successfully applied to process contexts with historical batches and high-frequency data from Process Analytical Technologies probes to predict and control process behaviour as a function of the measured operating variables. 8–11
Successful formulation of a data-driven model relies on high data quality and availability. Operational modifications, such as transfer to new equipment or changes in process conditions, can introduce enough differences in process behaviour to shift towards a data-scarce context, whereby not enough data is available to build a robust data-driven model. It therefore becomes critical to leverage existing, historical knowledge of related and well-characterised processes during the model building procedure, reducing the data requirement. 12
In this work, we evaluate the capabilities of transfer learning (TL) for NODEs . NODEs are trained against in-silico simulations of unseeded and template-assisted crystallisation. We firstly confirm that the platform process has enough historical data to build a robust data-driven surrogate, while the target process has limited data availability that cannot be used to formulate a reliable data-driven surrogate. Next, different TL techniques, including model fine-tuning and neural network layer freezing, are assessed. Results indicate that the existing data-driven models and a single training trajectory can be used to formulate a reliable data-driven crystallisation model. By enabling knowledge transfer across data-driven models of related processes, data requirements are successfully reduced.
Acknowledgments
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) for the Imperial College London Doctoral Training Partnership (DTP) and by AstraZeneca UK Ltd through a CASE studentship award.
References
(1) Yadav, M. K.; Sahu, A.; Anu; Kasturria, N.; Priyadarshini, A.; Gupta, A.; Gupta, K.; Tomar, A. K. Clinical Applications of Protein-Based Therapeutics. In Protein-based Therapeutics; Singh, D. B., Tripathi, T., Eds.; Springer Nature: Singapore, 2023; pp 23–47. https://doi.org/10.1007/978-981-19-8249-1_2.
(2) Li, Y.; Stern, D.; Lock, L. L.; Mills, J.; Ou, S.-H.; Morrow, M.; Xu, X.; Ghose, S.; Li, Z. J.; Cui, H. Emerging Biomaterials for Downstream Manufacturing of Therapeutic Proteins. Acta Biomaterialia 2019, 95, 73–90. https://doi.org/10.1016/j.actbio.2019.03.015.
(3) Ferreira, J.; Araújo, S.; Ferreira, A.; Teixeira, J.; de Campos, J. M.; Rocha, F.; Castro, F. Insulin Nucleation Kinetics in an Oscillatory Flow-Based Platform: Protein Crystallization as a Highly Reproducible Separation Process. Chemical Engineering Research and Design 2024, 203, 453–466. https://doi.org/10.1016/j.cherd.2024.01.057.
(4) Orosz, Á.; Szilágyi, E.; Spaits, A.; Borsos, Á.; Farkas, F.; Markovits, I.; Százdi, L.; Volk, B.; Kátainé Fadgyas, K.; Szilágyi, B. Dynamic Modeling and Optimal Design Space Determination of Pharmaceutical Crystallization Processes: Realizing the Synergy between Off-the-Shelf Laboratory and Industrial Scale Data. Ind. Eng. Chem. Res. 2024, 63 (9), 4068–4082. https://doi.org/10.1021/acs.iecr.3c03954.
(5) Bhonsale, S. S.; Stokbroekx, B.; Van Impe, J. Assessment of the Parameter Identifiability of Population Balance Models for Air Jet Mills. Computers & Chemical Engineering 2020, 143, 107056. https://doi.org/10.1016/j.compchemeng.2020.107056.
(6) Fysikopoulos, D.; Benyahia, B.; Borsos, A.; Nagy, Z. K.; Rielly, C. D. A Framework for Model Reliability and Estimability Analysis of Crystallization Processes with Multi-Impurity Multi-Dimensional Population Balance Models. Computers & Chemical Engineering 2019, 122, 275–292. https://doi.org/10.1016/j.compchemeng.2018.09.007.
(7) Zhao, Y.; Jiang, C.; Vega, M. A.; Todd, M. D.; Hu, Z. Surrogate Modeling of Nonlinear Dynamic Systems: A Comparative Study. Journal of Computing and Information Science in Engineering 2023, 23 (1), 011001. https://doi.org/10.1115/1.4054039.
(8) Maceiczyk, R. M.; deMello, A. J. Fast and Reliable Metamodeling of Complex Reaction Spaces Using Universal Kriging. J. Phys. Chem. C 2014, 118 (34), 20026–20033. https://doi.org/10.1021/jp506259k.
(9) Lima, F. A. R. D.; de Moraes, M. G. F.; Secchi, A. R.; de Souza Jr., M. B. Development of a Recurrent Neural Networks-Based NMPC for Controlling the Concentration of a Crystallization Process. Digital Chemical Engineering 2022, 5, 100052. https://doi.org/10.1016/j.dche.2022.100052.
(10) Sitapure, N.; Kwon, J. S.-I. CrystalGPT: Enhancing System-to-System Transferability in Crystallization Prediction and Control Using Time-Series-Transformers. Computers & Chemical Engineering 2023, 177, 108339. https://doi.org/10.1016/j.compchemeng.2023.108339.
(11) Chiu, K.-C.; Du, D. A Neural Ordinary Differential Equation Model for Predicting the Growth of Chinese Hamster Ovary Cell in a Bioreactor System. Biotechnol Bioproc E 2025, 30 (1), 100–115. https://doi.org/10.1007/s12257-024-00141-2.
(12) Thebelt, A.; Wiebe, J.; Kronqvist, J.; Tsay, C.; Misener, R. Maximizing Information from Chemical Engineering Data Sets: Applications to Machine Learning. Chemical Engineering Science 2022, 252, 117469. https://doi.org/10.1016/j.ces.2022.117469.