2025 Spring Meeting and 21st Global Congress on Process Safety

(150b) Clustering-Based Adaptive Sampling (CAS) for Systems Metamodeling

Authors

Iftekhar Karimi, National University of Singapore
As industries embrace the innovations of Industry 4.0, AI and ML technologies have found increased attention. Advances in computing resources, data-storage capabilities, and ML techniques have made data-driven modeling a key driver of this transition. By effectively learning the correlations between input and output data, data-driven models (metamodels or surrogate models) can capture a system’s behavior throughout the input domain (systems modeling). Two key factors affect the modeling capabilities of surrogate models, viz. the surrogate form (model form and configuration) and the quality and quantity of data available. The surrogate form chosen to develop the model should have sufficient flexibility to capture the underlying nonlinearities of the response profile. Additionally, the input-output data on which the surrogate model is trained should yield maximal system information so that the trained surrogate can be reliably used for predictions across the input domain. Both factors are equally important in systems modeling. While identification and development of the best surrogate forms for systems modeling and optimization has seen promising progress over the past few years (Ahmad and Karimi, 2022, 2021; Cozad et al., 2014; Garud et al., 2018; Sun and Braatz, 2021), the exercise of sampling high-quality data requires more attention.

Adaptive sampling techniques that progressively sample input data (points in the input domain) from the most informative regions can potentially generate quality data for systems modeling. Rather than sampling points in a static, one-shot manner by covering the domain uniformly with several points, adaptive sampling techniques add points in an iterative manner by balancing global exploration of the entire domain with local exploitation of specific regions which may be difficult-to-model, such as regions characterized by nonlinearities, kinks, and discontinuities in the response profile. Existing works on adaptive sampling suffer from different limitations, such as being specifically valid for a particular modeling technique (Farhang‐Mehr and Azarm, 2005; Lam and Notz, 2008); relying on jackknifing for variance estimation which may be inaccurate and time-consuming (Eason and Cremaschi, 2014; Kleijnen and Beers, 2004); requiring iterative optimization that may be time-consuming and expensive (Garud et al., 2017; Li et al., 2010); or constructing compute-intensive Delaunay triangulations or Voronoi tessellations in high-dimensions (Crombecq et al., 2009; Xu et al., 2014). Moreover, most of the existing adaptive sampling algorithms require the user to specify the number of points to be sampled or provide a measure of surrogate accuracy for termination. However, in the absence of sufficient system knowledge or complexities in the response profile, specifying a guess value for surrogate accuracy or number of points may not be straightforward.

To these ends, we develop a novel adaptive sampling algorithm for systems modeling, Clustering-based Adaptive Sampling or CAS, addressing the common limitations of past works, and capable of self-termination. At any iteration, our algorithm uses k-means clustering to group points into a few well-separated clusters and defines local regions by approximating the Voronoi tessellation formed by cluster centroids via hypercubes. Each region is associated with its hypervolume and nonlinearity score, which enables exploration and exploitation of the input domain to search for a promising subdomain for point addition. Two new points are added separately by exploration and exploitation using a candidate sampling strategy. This iterative process continues until the average improvement in surrogate accuracy over the last few consecutive iterations is less than a threshold value.

We compared CAS against two promising sampling techniques, static-based Sobol technique (SOB) with excellent space-filling capabilities and a recent promising adaptive sampling algorithm, Smart Sampling Algorithm (SSA) by Garud et al., 2017. We defined two performance indicators, “Performance Edge” and “Computational Efficiency” to quantify the advantage of using CAS over SOB or SSA in developing accurate surrogates, and the expected savings in computational budget by CAS over SOB or SSA, respectively. Our numerical study spanned a diverse bed of 40 test functions with varying shapes and input dimensionalities (), varying number of sample points, and six surrogate forms from three modeling techniques. Our extensive assessment highlighted that CAS outperformed SOB and SSA for majority of the test functions based on both performance indicators using all surrogate forms. We implemented our algorithm in modeling two pharmaceutical processes, a continuous mixing process involving 29 inputs and a solvent swap process involving 12 inputs. Since SSA does not scale well with high-dimensional systems, we compared CAS against SOB in modeling both systems. CAS was able to develop more accurate surrogates than SOB at the same computational budget for both processes.

Thus, our proposed adaptive sampling algorithm serves as a robust and reliable sampling technique for accurately modeling large-dimensional, complex systems at low computational budgets. In the future, we aim to extend the functionality of CAS to generate useful data for modeling constrained systems efficiently.

References:

Ahmad, M., Karimi, I.A., 2022. Families of similar surrogate forms based on predictive accuracy and model complexity. Comput. Chem. Eng. 163, 107845. https://doi.org/10.1016/j.compchemeng.2022.107845

Ahmad, M., Karimi, I.A., 2021. Revised learning based evolutionary assistive paradigm for surrogate selection (LEAPS2v2). Comput. Chem. Eng. 152, 107385. https://doi.org/10.1016/j.compchemeng.2021.107385

Cozad, A., Sahinidis, N.V., Miller, D.C., 2014. Learning surrogate models for simulation‐based optimization. AIChE J. 60, 2211–2227. https://doi.org/10.1002/aic.14418

Crombecq, K., De Tommasi, L., Gorissen, D., Dhaene, T., 2009. A novel sequential design strategy for global surrogate modeling, in: Proceedings of the 2009 Winter Simulation Conference (WSC). Presented at the 2009 Winter Simulation Conference - (WSC 2009), IEEE, Austin, TX, USA, pp. 731–742. https://doi.org/10.1109/WSC.2009.5429687

Eason, J., Cremaschi, S., 2014. Adaptive sequential sampling for surrogate model generation with artificial neural networks. Comput. Chem. Eng. 68, 220–232. https://doi.org/10.1016/j.compchemeng.2014.05.021

Farhang‐Mehr, A., Azarm, S., 2005. Bayesian meta‐modelling of engineering design simulations: a sequential approach with adaptation to irregularities in the response behaviour. Int. J. Numer. Methods Eng. 62, 2104–2126. https://doi.org/10.1002/nme.1261

Garud, S.S., Karimi, I.A., Kraft, M., 2018. LEAPS2: Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Comput. Chem. Eng. 119, 352–370. https://doi.org/10.1016/j.compchemeng.2018.09.008

Garud, S.S., Karimi, I.A., Kraft, M., 2017. Smart Sampling Algorithm for Surrogate Model Development. Comput. Chem. Eng. 96, 103–114. https://doi.org/10.1016/j.compchemeng.2016.10.006

Kleijnen, J.P.C., Beers, W.C.M.V., 2004. Application-driven sequential designs for simulation experiments: Kriging metamodelling. J. Oper. Res. Soc. 55, 876–883. https://doi.org/10.1057/palgrave.jors.2601747

Lam, C.Q., Notz, W.I., 2008. Sequential adaptive designs in computer experiments for response surface model fit. Stat. Appl. 6, 207–233.

Li, G., Aute, V., Azarm, S., 2010. An accumulative error based adaptive design of experiments for offline metamodeling. Struct. Multidiscip. Optim. 40, 137–155. https://doi.org/10.1007/s00158-009-0395-z

Sun, W., Braatz, R.D., 2021. Smart process analytics for predictive modeling. Comput. Chem. Eng. 144, 107134. https://doi.org/10.1016/j.compchemeng.2020.107134

Xu, S., Liu, H., Wang, X., Jiang, X., 2014. A Robust Error-Pursuing Sequential Sampling Approach for Global Metamodeling Based on Voronoi Diagram and Cross Validation. J. Mech. Des. 136, 071009. https://doi.org/10.1115/1.4027161