2025 Spring Meeting and 21st Global Congress on Process Safety

(74b) Bootstrap-Based Adaptation of Data-Driven Models to Tackle Step Changes in Process Data

Authors

Richard D. Braatz, Massachusetts Institute of Technology
Ensuring product quality in biomanufacturing processes is vital for safety and effectiveness of biopharmaceutical products. Constant monitoring of the critical quality attributes (CQAs) and critical process parameter (i.e., process variables) is fundamental to this end.

Process variables are typically measured online at high sampling rate, thereby guaranteeing appropriate resolution for data-driven process monitoring and prompt fault detection. On the other hand, CQAs often involve time-consuming and expensive measurement procedures, thus far fewer observations are available (O’Flaherty et al. 2020). Soft sensors (Kadlec et al. 2009; Zhu et al. 2020) can be used to aid quality monitoring by providing real-rime predictions of CQAs based on process variables. In the context of biomanufacturing, partial least-squares (PLS) regression (Geladi et al. 1986; Wold et al. 2001) is a widely used data-driven model enabling both prediction of the CQAs and process monitoring.

PLS models can be developed exploiting historical production datasets typically collected as part of the standard operation of most processes. However, being PLS models based on historical datasets, they only capture a “snapshot” of the process. Prediction and monitoring performance may degrade upon process changes, e.g., slow membrane fouling in downstream processing or variations of the culture medium composition in upstream processing. Changes in the process scale, e.g., a variation in the culture volume in a batch bioreactor, can be particularly detrimental to the model performance due too complex changes in the mixing patterns, gas-liquid mass transfer, and, ultimately, cell behavior. These are well-known challenges in the scale-up of biomanufacturing processes (Facco et al. 2020).

Model adaptation methods (Kadlec et al. 2011) provide a way to handle the changing nature of a process. Moving average and exponential weighting schemes can be combined with PLS modeling to design recursive algorithms (Dayal et al. 1997; Qin 1998) continuously updating the model. Such methods are easy to implement and agnostic to the kind of change affecting the process, thus being attractive solutions for model adaptation. However, their performance may vary for different modes of change (i.e., slow drifts or abrupt steps). In the case of a step change in the data, such methods do not typically perform well right after the variation due to the strong inbalance between “old” and “new” data conditions, where the latter data may appear as outliers to the model. A similar issue applies when few “old” observations are left in the memory of the model, thus adaptation is typically completed only after all “old” observations have been forgotten. Finally, care must be taken to avoid forgetting valuable information stored in past data when the process operates steadily, i.e., no change is happening.

A key point is that moving average and exponential weighting adaptation do not make any assumption on the nature of the process change or on its location in time. In fact, such methods can be executed even if no process change is involved at all. On the other hand, some process changes are known a priori, e.g., the culture volume of a bioreactor can be increased or decreased in response to a change in the product demand. Such valuable information, i.e., the presence of a well defined step change in the process, should be exploited.

In this study, we propose a model adaptation strategy tailored to tackle known step changes in the process. Our method combines PLS modeling with bootstrap resampling (Efron et al. 1993) to accelerate the model adaptation. Specifically, we use the bootstrap to artificially augment the dataset after the step change to avoid the aforementioned data imbalance issue. We do so by sampling with replacement from the “new” data to obtain a reasonable number of observations. We then corrupt each observation by adding Gaussian noise with zero mean and covariance consistent with the noise in the “old” data (estimated using a PLS model of the “old” data alone) to obtain a realistic augmented dataset. We demonstrate our approach on a simple numerical case study and on a simulated bioreactor case study, both undergoing a step change meant to simulate a scale-up scenario. We also run the moving average and exponential weighting adaptation schemes on the same case studies. Compared to those method, the bootstrap-based approach achieves faster adaptation, better and more stable predictive performance in the intermediate period, highlighting the importance of incorporating knowledge on the process change being tackled into the model adaptation procedure.

References

Dayal, B. S. and J. F. MacGregor (1997). Recursive exponentially weighted PLS and its applications to adaptive control and prediction. Journal of Process Control 7 (3), 169–179. doi: 10.1016/S0959-1524(97)80001-7.

Efron, B. and R. Tibshirani (1993). An Introduction to the Bootstrap. Boca Raton (FL): Chapman & Hall/CRC.

Facco, P., S. Zomer, R. C. Rowland-Jones, D. Marsh, P. Diaz-Fernandez, G. Finka, F. Bezzo, and M. Barolo (2020). Using data analytics to accelerate biopharmaceutical process scale-up. Biochemical Engineering Journal 164 (9), 107791. doi: 10.1016/j.bej.2020.107791.

Geladi, P. and B. R. Kowalski (1986). Partial Least-Squares Regression: A Tutorial. Partial Least-Squares Regression: A Tutorial 185, 1–17. doi: 10.1016/0003-2670(86)80028-9.

Kadlec, P., B. Gabrys, and S. Strandt (2009). Data-driven Soft Sensors in the process industry. Computers and Chemical Engineering 33 (4), 795–814. doi: 10.1016/j.compchemeng.2008.12.012.

Kadlec, P., R. Grbíc, and B. Gabrys (2011). Review of adaptation mechanisms for data-driven soft sensors. Computers and Chemical Engineering 35 (1), 1–24. doi: 10.1016/j.compchemeng.2010.07.034.

O’Flaherty, R., A. Bergin, E. Flampouri, L. Martins Mota, I. Obaidi, A. Quigley, Y. Xie, and M. Butler (2020). Mammalian cell culture for production of recombinant proteins: A review of the critical steps in their biomanufacturing. Biotechnology Advances 43 (5), 107552. doi: 10.1016/j.biotechadv.2020.107552.

Qin, S. J. (1998). Recursive PLS algorithms for adaptive data modeling. Computers and Chemical Engineering 22 (4–5), 503–514. doi: 10.1016/s0098-1354(97)00262-7.

Wold, S., M. Sjöström, and L. Eriksson (2001). PLS-regression: A basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58 (2), 109–130. doi: 10.1016/S0169-7439(01)00155-1.

Zhu, X., K. U. Rehman, B. Wang, and M. Shahzad (2020). Modern Soft-Sensing Modeling Methods for Fermentation Processes. Sensors 20, 1771. doi: 10.3390/s20061771.