The commercial success of prospective biopharmaceutical products is largely contingent upon a timely and efficient process development campaign. Implemented through the Quality by Design (QbD) framework, mathematical and statistical process models yield significant influence through intrinsically linking the quality attributes of a biopharmaceutical product with a set of prominent process parameters. Specifically, with regard to upstream cell culture operations the influence of the process environment on biopharmaceutical product quality remains significant, requiring extensive experimentation during development, to enable detailed characterisation of the process design space and contribute towards understanding the underlying process dynamics. Dynamic process models, which capture the time-dependent kinetics of the cell culture process, are essential for gaining insight into the intrinsic process dynamics. However, deployment of dynamic models within an upstream biopharmaceutical process development context is often impinged by manually intensive tasks associated with data processing and analysis of experimental cell culture datasets. Automation of menial tasks within a systematic workflow, driven by digital tools and Industry 5.0 principles, enables a shift in focus towards actionable insight. This transition accelerates development activities, fostering greater innovation and efficiency during biopharmaceutical process development.
In this work, an automated data processing and analysis workflow is presented, based upon the general requirements of a fed-batch CHO cell culture process for the manufacture of therapeutic monoclonal antibodies (mAbs). Outlier identification, missing value imputation, and extended feature engineering are systematically addressed, thus enabling enhanced exploration of all quantified state and process variables pertaining to the studied cell culture process, when coupled with multivariate analysis techniques. Hierarchical clustering and principal component analysis are applied within the automated framework, to enable deeper insight into the relationships between state and process variables, while supporting decisions regarding the structure of devised dynamic models.
To support development of hybrid dynamic models, which combine machine learning with mechanistic knowledge, recursive feature selection techniques were developed and applied within the data processing framework, to identify the most appropriate input parameters for data-driven and machine learning elements. Kinetic rates governing cell growth and metabolism, were defined on a cell-specific basis, and estimated directly from experimental data to support model training and validation. The devised automated workflow was trialled and tested on a series of CHO cell culture datasets, generated using a high-throughput AMBR250 cell culture system. The strategy yields a significant reduction in the time and resource requirements for dynamic model assembly, and supports model deployment within a process development context, while providing deeper insight into generated experimental data.
Furthermore, the systematic data processing framework facilitates efficient data management, ensuring structured storage, retrieval, and integration of cell culture process datasets when coupled with an advanced cloud-based knowledge management platform. Automated strategies for data processing, analysis and model deployment are aptly posed to advance the application of mathematical models during process development, contributing towards enhanced efficiency and accelerated development strategies. Ultimately, the presented framework paves the way for the advance of bioprocess digital twins, which aim to provide a virtual representation of the process, thus enabling enhanced monitoring, computational optimisation, and informed decision-making throughout development and manufacture.