2024 AIChE Annual Meeting

(519d) Enhancing Data-Driven Anomaly Detection in Biopharmaceutical Manufacturing through Domain Adaptation and Machine Learning: A Case Study on Industrial Freeze-Dryers

Authors

Badr, S., The University of Tokyo
Schmid, C., F. Hoffmann-La Roche
Knüppel, S., F. Hoffmann-La Roche
Sugiyama, H., The University of Tokyo
The biopharmaceutical industry's remarkable growth in approvals and sales underscores the need for robust and safe manufacturing processes to ensure efficient utilization of production capacity for the manufacturing of high-quality products according to regulatory requirements. Quality by Design (QbD) principles, coupled with the ongoing digitalization of pharmaceutical manufacturing, have paved the way for advanced data-driven approaches such as Machine Learning (ML) in process monitoring, anomaly detection, and condition monitoring. However, the effective utilization of industrial process data is faced with several challenges, which can complicate the application of ML techniques for process monitoring purposes, including limitations in data quantity and quality [1]. For example, data drifts and variations across production periods can limit the data available for effective algorithm training.

To address these challenges, Domain Adaptation (DA) has emerged as a promising solution. DA techniques enable the transfer of knowledge from a source domain (e.g., real-time production data) to a target domain (e.g., historical data or data from a similar machine) while mitigating the effects of domain shifts. This approach is particularly relevant in industrial settings where data is sparse and data distributions may vary due to factors such as different operating modes, changes due to maintenance, or technical interventions [2]. In this study, we leverage DA and ML techniques for anomaly detection in two parallelized, industrial freeze-dryers during leak testing, addressing the aforementioned issues.

In a conventional anomaly detection model, following a maintenance procedure with batch operations, data shifts require frequent model retraining, which could severely lower the usefulness of such algorithms especially given the frequency of the maintenance operations. In the first case study presented in this work, DA in the form of mean alignment (MAL) is presented as an approach to minimize the number of batches required for training in each maintenance period, by leveraging the data from previous periods. Mean values were estimated from initial leak tests after maintenance (source domain) and then used for aligning data of future leak tests to data from the period before maintenance (target domain). An anomaly detection model was then trained using the target domain data comprising dimensionality reduction of process parameters using Principal Component Analysis (PCA), followed by One-Class Support Vector Machines (OCSVM) to define the boundary between normal operating conditions and anomalies. Model performance (i.e., F1 Score) was compared to a PCA/SVM model trained only on the initial leak tests in the source domain. The results clearly show that the model utilizing historical data through MAL requires a much smaller training set in the new period to achieve the same levels of model performance.

In a second case study, a similar approach was used to demonstrate that a PCA/SVM model trained on one freeze-dryer (target domain) can be transferred to another freeze dryer (source domain) by aligning process data using Correlation Alignment (CORAL) [3]. The results show comparable performance to the model trained only on the initial leak tests in the source domain, implying that DA can reduce modeling effort by transferring existing models to multiple machines and demonstrating the usefulness of DA when limited data is available for similar machines.

This work contributes to the ongoing research on data-driven anomaly detection and condition monitoring in the biopharmaceutical industry, highlighting the significance of DA as a powerful tool for addressing data-related challenges, specifically data sparsity and data shifts, and enhancing the performance of condition monitoring systems.

[1] C. Ji and W. Sun, “A Review on Data-Driven Process Monitoring Methods: Characterization and Mining of Industrial Data,” Processes, vol. 10, no. 2. MDPI, Feb. 01, 2022. doi: 10.3390/pr10020335.

[2] P. Zürcher, S. Badr, S. Knüppel, and H. Sugiyama, “Data-driven equipment condition monitoring and reliability assessment for sterile drug product manufacturing: Method and application for an operating facility,” Chemical Engineering Research and Design, vol. 188, pp. 301–314, Dec. 2022, doi: 10.1016/j.cherd.2022.09.005.

[3] B. Sun, J. Feng, and K. Saenko, “Return of Frustratingly Easy Domain Adaptation,” in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 2016. [Online]. Available: www.aaai.org