2024 AIChE Annual Meeting

(343e) Wavelength Selection Algorithm Robust Against Impurities for Sample-Efficient PAT

Authors

Kobayashi, S. - Presenter, Kyoto University
Kato, S., Kyoto University
Kano, M., Kyoto University

Introduction

Near-infrared (NIR) spectroscopy is a method that can analyze samples faster than conventional methods such as high-performance liquid chromatography (HPLC) and gas chromatography (GC). Building calibration models requires a lot of materials and time. To solve this problem, there has been a growing interest in pure component-based approaches such as classical least squares (CLS) [1] and iterative optimization technology (IOT) [2] due to their cost-effectiveness in the pharmaceutical industry. However, these approaches may not work when impurities exist because they assume no impurities. In the present research, we propose a new wavelength selection method that makes pure component-based approaches robust against impurities with as few samples as possible.

Conventional pure component-based method

Both CLS and IOT are pure component-based modeling approaches.

CLS is a regression-based method based on the Beers-Lambert law. CLS regression coefficients are determined by using pure spectra. However, when CLS predicts concentrations of components, there is a possibility of generating unrealistic results such as negative concentrations.

IOT was developed to solve the problem of CLS. Instead of using regression modeling, IOT solves an optimization problem to make predictions under constraints, preventing unrealistic values from being generated. Typical constraints include:
C1. The mole fraction of any pure component is greater than or equal to 0 and less than or equal to 1.
C2. The sum of the molar fractions of known pure components is equal to 1.

Proposed method

IOT assumes that impurities are not included. When impurities are generated by side reactions, however, the sum of the mole fractions of known pure components will be less than 1. This situation violates constraint C2. To solve this problem and make IOT robust against impurities, we propose to use IOT without constraint C2.

In addition, to predict molar fractions accurately even if impurities are present, we propose a new wavelength selection method with a few samples, which is referred to as sample-efficient wavelength selection (SEWS). The samples are defined as sets of spectra of mixtures and concentrations of the known pure components. The procedure of SEWS is as follows.

1. Remove the influence of known pure components from the spectra of mixtures. The mixtures consist of known pure components and impurities. The spectra of the known pure components are given, those of the impurities are unknown. The number of mixture spectra is N, which is the number of samples. The remaining spectra are denoted as Aother.

2. Calculate the variance of Aother at each wavelength.

3. Select the wavelengths where the variance of Aother does not exceed a certain threshold. The other wavelengths are considered unsuitable for a pure component-based approach and is not utilized in model construction.

In this work, we combine IOT without constraint C2 with SEWS to predict the mole fractions accurately in the presence of impurities. This method is called SEWS-IOT.

Results and Discussion

To verify the performance of SEWS-IOT, simulated data that meet the following conditions were generated.

1. There are three components: materials A, B, and C. The pure spectra of materials A and B are known, but the pure spectram of material C is unknown.

2. The peaks of the pure spectra of materials A and B partially overlap with the peaks of the pure spectram of material C.

3. The spectrum of a mixture follows the Beer-Lambert law.

The sample size N for wavelength selection was 0, 5, or 10, and that for testing the prediction model was 100. We used SEWS-IOT to predict the molar fraction of material A. This experiment was repeated 10 times by changing datasets (random seeds), and the average RMSE was calculated for the performance evaluation.

When SEWS was not used (i.e., N = 0, conventional IOT), the average RMSE value was 3.0×10-2. As the number of samples N increased, the average RMSE decreased. The average RMSE was 2.1×10-2 when N = 5, and it was 1.5×10-2 when N = 10. By using 10 samples, SEWS-IOT reduced the average RMSE by 50% in comparison with IOT without constraint C2. The results have demonstrated the effectiveness of SEWS as a sample-efficient wavelength selection method.

Conclusion

We proposed SEWS, a sample-efficient wavelength selection method robust against impurities, and we showed the effectiveness of SEWS-IOT, which can enhance the prediction accuracy of IOT using SEWS. The proposed SEWS-IOT reduced the average RMSE by 50% with just 10 samples for wavelength selection. By using this proposed method, process analytical technology (PAT) can be realized economically when conventional pure component-based methods do not work due to impurities. In addition, spectroscopic analysis such as NIR can be easily applied to analyzing the results of chemical reactions where impurities are generated.

Acknowledgement

This work was supported by New Energy and Industrial Technology Development Organization (NEDO) Grant No. JPNP19004.

References

[1] Adam, J., et al.: Comparison Between Pure Component Modeling Approaches for Monitoring Pharmaceutical Powder Blends with Near‑Infrared Spectroscopy in Continuous, AAAP Jornal, 24 (2022)
[2] Muteki, K., et al: Using Iterative Optimization Technology (Calibration-Free/Minimum Approach, Industrial and Engineering Chemistry Research, 52, 12258–12268 (2013)