2025 Spring Meeting and 21st Global Congress on Process Safety
(150b) Clustering-Based Adaptive Sampling (CAS) for Systems Metamodeling
Adaptive sampling techniques that progressively sample input data (points in the input domain) from the most informative regions can potentially generate quality data for systems modeling. Rather than sampling points in a static, one-shot manner by covering the domain uniformly with several points, adaptive sampling techniques add points in an iterative manner by balancing global exploration of the entire domain with local exploitation of specific regions which may be difficult-to-model, such as regions characterized by nonlinearities, kinks, and discontinuities in the response profile. Existing works on adaptive sampling suffer from different limitations, such as being specifically valid for a particular modeling technique (Farhang‐Mehr and Azarm, 2005; Lam and Notz, 2008); relying on jackknifing for variance estimation which may be inaccurate and time-consuming (Eason and Cremaschi, 2014; Kleijnen and Beers, 2004); requiring iterative optimization that may be time-consuming and expensive (Garud et al., 2017; Li et al., 2010); or constructing compute-intensive Delaunay triangulations or Voronoi tessellations in high-dimensions (Crombecq et al., 2009; Xu et al., 2014). Moreover, most of the existing adaptive sampling algorithms require the user to specify the number of points to be sampled or provide a measure of surrogate accuracy for termination. However, in the absence of sufficient system knowledge or complexities in the response profile, specifying a guess value for surrogate accuracy or number of points may not be straightforward.
To these ends, we develop a novel adaptive sampling algorithm for systems modeling, Clustering-based Adaptive Sampling or CAS, addressing the common limitations of past works, and capable of self-termination. At any iteration, our algorithm uses k-means clustering to group points into a few well-separated clusters and defines local regions by approximating the Voronoi tessellation formed by cluster centroids via hypercubes. Each region is associated with its hypervolume and nonlinearity score, which enables exploration and exploitation of the input domain to search for a promising subdomain for point addition. Two new points are added separately by exploration and exploitation using a candidate sampling strategy. This iterative process continues until the average improvement in surrogate accuracy over the last few consecutive iterations is less than a threshold value.
We compared CAS against two promising sampling techniques, static-based Sobol technique (SOB) with excellent space-filling capabilities and a recent promising adaptive sampling algorithm, Smart Sampling Algorithm (SSA) by Garud et al., 2017. We defined two performance indicators, “Performance Edge” and “Computational Efficiency” to quantify the advantage of using CAS over SOB or SSA in developing accurate surrogates, and the expected savings in computational budget by CAS over SOB or SSA, respectively. Our numerical study spanned a diverse bed of 40 test functions with varying shapes and input dimensionalities (), varying number of sample points, and six surrogate forms from three modeling techniques. Our extensive assessment highlighted that CAS outperformed SOB and SSA for majority of the test functions based on both performance indicators using all surrogate forms. We implemented our algorithm in modeling two pharmaceutical processes, a continuous mixing process involving 29 inputs and a solvent swap process involving 12 inputs. Since SSA does not scale well with high-dimensional systems, we compared CAS against SOB in modeling both systems. CAS was able to develop more accurate surrogates than SOB at the same computational budget for both processes.
Thus, our proposed adaptive sampling algorithm serves as a robust and reliable sampling technique for accurately modeling large-dimensional, complex systems at low computational budgets. In the future, we aim to extend the functionality of CAS to generate useful data for modeling constrained systems efficiently.
References:
Ahmad, M., Karimi, I.A., 2022. Families of similar surrogate forms based on predictive accuracy and model complexity. Comput. Chem. Eng. 163, 107845. https://doi.org/10.1016/j.compchemeng.2022.107845
Ahmad, M., Karimi, I.A., 2021. Revised learning based evolutionary assistive paradigm for surrogate selection (LEAPS2v2). Comput. Chem. Eng. 152, 107385. https://doi.org/10.1016/j.compchemeng.2021.107385
Cozad, A., Sahinidis, N.V., Miller, D.C., 2014. Learning surrogate models for simulation‐based optimization. AIChE J. 60, 2211–2227. https://doi.org/10.1002/aic.14418
Crombecq, K., De Tommasi, L., Gorissen, D., Dhaene, T., 2009. A novel sequential design strategy for global surrogate modeling, in: Proceedings of the 2009 Winter Simulation Conference (WSC). Presented at the 2009 Winter Simulation Conference - (WSC 2009), IEEE, Austin, TX, USA, pp. 731–742. https://doi.org/10.1109/WSC.2009.5429687
Eason, J., Cremaschi, S., 2014. Adaptive sequential sampling for surrogate model generation with artificial neural networks. Comput. Chem. Eng. 68, 220–232. https://doi.org/10.1016/j.compchemeng.2014.05.021
Farhang‐Mehr, A., Azarm, S., 2005. Bayesian meta‐modelling of engineering design simulations: a sequential approach with adaptation to irregularities in the response behaviour. Int. J. Numer. Methods Eng. 62, 2104–2126. https://doi.org/10.1002/nme.1261
Garud, S.S., Karimi, I.A., Kraft, M., 2018. LEAPS2: Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Comput. Chem. Eng. 119, 352–370. https://doi.org/10.1016/j.compchemeng.2018.09.008
Garud, S.S., Karimi, I.A., Kraft, M., 2017. Smart Sampling Algorithm for Surrogate Model Development. Comput. Chem. Eng. 96, 103–114. https://doi.org/10.1016/j.compchemeng.2016.10.006
Kleijnen, J.P.C., Beers, W.C.M.V., 2004. Application-driven sequential designs for simulation experiments: Kriging metamodelling. J. Oper. Res. Soc. 55, 876–883. https://doi.org/10.1057/palgrave.jors.2601747
Lam, C.Q., Notz, W.I., 2008. Sequential adaptive designs in computer experiments for response surface model fit. Stat. Appl. 6, 207–233.
Li, G., Aute, V., Azarm, S., 2010. An accumulative error based adaptive design of experiments for offline metamodeling. Struct. Multidiscip. Optim. 40, 137–155. https://doi.org/10.1007/s00158-009-0395-z
Sun, W., Braatz, R.D., 2021. Smart process analytics for predictive modeling. Comput. Chem. Eng. 144, 107134. https://doi.org/10.1016/j.compchemeng.2020.107134
Xu, S., Liu, H., Wang, X., Jiang, X., 2014. A Robust Error-Pursuing Sequential Sampling Approach for Global Metamodeling Based on Voronoi Diagram and Cross Validation. J. Mech. Des. 136, 071009. https://doi.org/10.1115/1.4027161