2025 AIChE Annual Meeting

(191ao) Heterogeneous Transfer Learning for Batch and Continuous Manufacturing Transitions in Pharmaceutical Tablet Production

Authors

Keita Yaginuma, Daiichi Sankyo Co., Ltd.
Kanta Sato, Daiichi Sankyo Co., Ltd.
Shota Kato, Kyoto University
Manabu Kano, Kyoto University
1. Introduction

In the pharmaceutical lifecycle, the manufacturing method transitions are important to ensure a stable supply of drug products and reduce manufacturing costs. For example, the transition from batch manufacturing (BM) to continuous manufacturing (CM) can lower the risk of drug shortages and reduce production costs [1].

A machine learning model can be useful in the manufacturing method transitions. The model supports the identification of optimal conditions in the new manufacturing method. However, data scarcity and differences in variables measured between the old and new methods hinder accurate model construction.

Transfer learning (TL) leverages knowledge from a source domain (SD) to improve model performance in a target domain (TD) where data are limited. Heterogeneous TL can be used when the SD and TD have different feature spaces, such as in manufacturing method transitions. Kobayashi et al. proposed Frustratingly Easy Heterogeneous Domain Adaptation (FEHDA) [2], a simple yet effective heterogeneous TL approach. They combined it with Gaussian Process Regression to improve prediction performance for toner quality. Yaginuma et al. applied FEHDA to scale-up scenarios in pharmaceutical tableting and demonstrated its effectiveness [3]. Although FEHDA can potentially improve model performance, it significantly increases the number of variables by expanding those common to the SD and TD. In data-scarce situations, the increased variables may lead to overfitting and degraded prediction performance.

This study proposes a heterogeneous TL approach for manufacturing method transitions, which reduces the number of variables compared to FEHDA. This modification is designed to enhance efficiency and predictive performance under data-scarce conditions. Using real-world data from a tablet manufacturing process, we evaluate our method through two scenarios: a transition from BM to CM and one from CM to BM. We then compare its performance with existing methods, including FEHDA.

2. Methods

2.1 FEHDA

FEHDA is a heterogeneous TL method that combines data matrices from the SD and the TD, as illustrated in Figure 1. The SD input matrix is divided into common variables Xc(s) and source-unique variables Xu(s), while the TD input matrix is split into common variables Xc(t) and target-unique variables Xu(t). To align feature spaces and handle domain shifts, FEHDA expands each of these variable groups by concatenating zero matrices. While effective for heterogeneous domains, this expansion increases the number of variables, which can reduce predictive performance when data are limited.

2.2 Proposed Method

The proposed method modifies FEHDA by removing the zero-padded expansion of common variables. This simplification reduces the number of variables compared to FEHDA and enhances predictive performance, especially when data are limited or domain shifts are small.

2.3 Comparison Methods

Two methods are used for comparison. The both-domain method (BD) uses only the common variables shared between SD and TD. The only-target method (OT) relies solely on TD data. The matrix structures of the BD and OT methods are also illustrated in Figure 1.

3. Experiments

3.1 Dataset

We used CM and BM datasets from a tablet manufacturing process at Daiichi Sankyo Co., Ltd., consisting of 16 CM and 15 BM samples. Process parameters (PPs) and the magnesium stearate (MgSt) composition were measured. The critical quality attribute (CQA) was the dissolution rate of mefenamic acid. The CM and BM processes shared six common variables; CM had nine unique variables, and BM had seven.

3.2 Problem Setting

We designed two problem settings for evaluation: BM-to-CM (BM as SD and CM as TD) and CM-to-BM (CM as SD and BM as TD).

3.3 Model Construction

We used partial least squares regression (PLSR) [4] with variable selection based on variable importance in projection (VIP) scores. The input matrix consisted of PPs and the MgSt composition, and the output was the CQA. For each method (Proposed, FEHDA, BD, and OT), models were trained using SD data and TD training data. Model performance was evaluated by the root mean square error (RMSE) on TD test data. Five-fold cross-validation was used to determine the optimal number of latent variables and to perform variable selection. This procedure was repeated ten times with different random splits.

4. Results and Discussion

The results demonstrated the effectiveness of the proposed method in both the BM-to-CM and CM-to-BM settings. In the BM-to-CM setting, the proposed method achieved the lowest test RMSE of 4.91% ± 0.84%, outperforming FEHDA (6.28% ± 1.85%), BD (6.78% ± 1.12%), and OT (5.78% ± 1.29%). Its training RMSE was 2.60% ± 0.87%, compared to FEHDA (2.89% ± 1.02%) and OT (2.75% ± 0.78%). The gap between training and test RMSEs was smaller for the proposed method (2.31%) than for FEHDA (3.39%) and OT (3.03%). Similarly, in the CM-to-BM setting, the proposed method attained a test RMSE of 4.55% ± 1.92%, outperforming FEHDA (6.00% ± 2.68%), BD (6.08% ± 1.72%), and OT (7.01% ± 2.84%). The gap between training and test RMSEs was again smaller for the proposed method (2.50%) than for FEHDA (4.05%) and OT (4.84%). These results indicate that the proposed method mitigated overfitting and thereby improved generalization and predictive performance.

Figure 2 represents a heatmap of variables selected by the proposed method and FEHDA in the CM-to-BM setting. The heatmap shows that FEHDA selected more variables than the proposed method. The larger feature space of FEHDA makes variable selection more difficult in small-sample scenarios like this study. In contrast, the proposed method starts with a smaller feature space, which allows for consistent variable selection, even with limited samples. This characteristic likely contributes to reduced overfitting and improved prediction performance.

5. Conclusion

This study proposed a heterogeneous TL method for manufacturing method transitions between BM and CM. The proposed method was applied to a tablet production dataset and outperformed existing approaches, including FEHDA. It achieved better predictive performance and reduced overfitting through more effective variable selection.

The proposed method may encounter challenges when there are significant domain shifts in variable distributions or the relationships between variables and the output. In such cases, methods like FEHDA may be more suitable.

Future work will involve generating synthetic datasets under varying conditions, such as the number of TD samples and the magnitude of domain shift. The proposed method and FEHDA will be applied to these datasets to compare their predictive performance and to clarify the strengths and limitations of the proposed method.

References

[1] Malevez, D., & Copot, D. (2021). From batch to continuous tablet manufacturing: A control perspective. IFAC-PapersOnLine, 54(15), 562–567.

[2] Kobayashi, S., Miyakawa, M., Takemasa, S., Takahashi, N., Watanabe, Y., Satoh, T., & Kano, M. (2022). Transfer learning for quality prediction in a chemical toner manufacturing process. In Y. Yamashita & M. Kano (Eds.), Computer Aided Chemical Engineering (Vol. 49, pp. 1663–1668). Elsevier.

[3] Yaginuma, K., Matsunami, K., Descamps, L., Ryckaert, A., & De Beer, T. (2024). Hybrid modeling of T-shaped partial least squares regression and transfer learning for formulation and manufacturing process development of new drug products. International Journal of Pharmaceutics, 662, 124463.

[4] Geladi, P., & Kowalski, B. R. (1986). Partial least-squares regression: A tutorial. Analytica Chimica Acta, 185, 1–17.