2025 AIChE Annual Meeting

(394r) Machine Learning-Based System for Real-Time Quantification of Heavy Metals across Diverse Solution Matrices Using Spectroscopy of Plasmas in Liquids

Industrial wastewater containing heavy metals is a major source of environmental pollution. Without continuous real-time monitoring, harmful discharges may go undetected, leading to regulatory violations, environmental damage, and public health risks. In real-world applications, fluctuating sample matrices, particularly variations in conductivity and ionic composition, represent one of the key challenges in accurate heavy metal quantification. Traditional methods often rely on sample preparation, such as acidification, to standardize the sample matrix and ensure reliable measurements. However, these procedures are labor-intensive and impractical for real-time applications. As a result, these limitations remain a major barrier to the practical implementation of real-time monitoring systems.

In this work, we demonstrate a machine learning (ML)-based approach with plasma spectroscopy capable of real-time quantification of heavy metals in aqueous solutions with varying solution matrices. Plasma is generated directly in the liquid via a pulsed high-voltage discharge, and the resulting emission spectra are recorded by a spectrometer operated through a Raspberry Pi–based control system. Spectral data were collected from prepared aqueous samples containing Cu, Ni, Zn, and Pb, which were simultaneously added to reflect multi-metal contamination commonly found in real-world wastewater. To capture the complexity of real-world solution matrices, conductivity levels were varied between 3000 and 4000 μS/cm. Predictions are made using calibration curves, regression planes, and ML models. Predictions based on pre-developed calibration curves can result in relative errors of up to 70%. Incorporating conductivity as an additional variable in a regression plane reduces the error to 33%, but this approach depends heavily on precise conductivity measurements, where even small deviations can significantly impact prediction accuracy.

To eliminate the need for conductivity measurements and further improve prediction accuracy, three ML architectures were tested: artificial neural network (ANN), convolutional neural network (CNN), and temporal convolutional transformer (TCT). These models rely solely on full-spectrum emission data as input, without requiring any additional information such as conductivity. All three models outperformed calibration curves and regression planes, by providing more accurate predictions across varying solution matrices. Among them, TCT achieves the highest performance, consistently maintaining MAPE below 10% for multiple metals. This shows a considerable improvement in performance compared to calibration curves and regression planes, which showed MAPE values exceeding 30% and 15%, respectively. To enhance interpretability and ensure consistency with domain knowledge, we applied an explainable artificial intelligence (XAI) method called occlusion-based Raman spectral feature extraction (ORSFE). This technique identifies key spectral features, highlights the importance of characteristic peaks in metal quantification, and emphasizes the value of full-spectrum analysis. By revealing most important spectral regions, ORSFE improves and makes the prediction process more reliable.

Additionally, to demonstrate the potential of machine learning for online monitoring, we conducted a continuous experiment in which spectral data were collected as the solution conductivity gradually increased over time. While pre-developed calibration curves produced large relative errors of up to -70% under these conditions, the TCT model maintained prediction errors within ±20% of actual values, meeting regulatory accuracy requirements and demonstrating accurate performance across a range of solution matrices. Without requiring conductivity as an input, the TCT model accurately captured matrix effects directly from the spectral data, confirming that plasma emission spectra inherently carry information about the solution environment. This approach eliminates the need for sample preparation and enables accurate, real-time quantification of heavy metal concentrations.