2020 Virtual AIChE Annual Meeting

(522a) Comparison of Surrogate Modeling Techniques for Design Space Approximation and Surrogate-Based Optimization: Effect of Sampling Technique and Sample Size

Authors

Williams, B. - Presenter, Auburn University
Cremaschi, S., Auburn University
Comparison of Surrogate Modeling Techniques for Design Space Approximation and Surrogate-Based Optimization: Effect of Sampling Technique and Sample Size

Bianca Williams, Selen Cremaschi

Session: Advances in Machine Learning and Intelligent Systems

Surrogate models, also known as response surfaces, black-box models, metamodels, or emulators, are simplified approximations of more complex, higher order models. These models are used to map input data to output data when the actual relationship between the two is unknown or computationally expensive to evaluate (Han & Zhang, 2012). Surrogate models can also be constructed for use in surrogate-based optimization when a closed analytical form of the relationship between input data and output data does not exist or is not conducive for use in traditional gradient-based optimization methods. Surrogate modeling techniques are of particular interest where high-fidelity, thus expensive, simulations are used (Han & Zhang, 2012) or when the fundamental relationship between the design variables and output variables is not well understood, such as in the design of cell or tissue manufacturing processes (Du et al., 2016).

With all the surrogate modeling techniques currently available, there is a need for a systematic procedure for selecting the appropriate technique for a given application. Current common practices for selecting the appropriate surrogate model form rely on process-specific expertise. Numerous studies have been conducted comparing the performance of surrogate modeling techniques for approximation purposes (Bhosekar & Ierapetritou, 2018; Davis et al., 2017). The majority of these only compare a few models on a limited number of data sets or for specific applications (Ju et al., 2016; Luo & Lu, 2014). Progress has been made in recent works in generalizing the process for selecting a surrogate model to approximate a design space by using meta-learning approaches to build selection frameworks (Cui et al., 2016; Garud et al., 2018), avoiding expensive trial-and-error methods. However, few of the developed meta-learning frameworks take model complexity into account, which can lead to overfitting, or consider that multiple models might perform similarly to the one identified as best in terms of their accuracies. The selection of surrogate models for surrogate-based optimization remains an open challenge.

Our goal is to comprehensively investigate and compare the performance of several different surrogate modeling techniques for both approximating functional relationships and for surrogate-based optimization to link that performance to the characteristics of the data involved in the application. Previous work on this topic has shown that the performance for approximation is dependent on data characteristics such as the input dimension and the underlying function shape (Davis et al., 2017; Williams & Cremaschi, 2019). The specific data characteristics being investigated in this study are the shape of the underlying function being modeled, the number of input dimensions, the sampling method used to generate the data, and the number of sample points in the dataset. The surrogate-modeling techniques considered include Artificial Neural Networks, Automated Learning of Algebraic Models using Optimization (ALAMO), Radial Basis Networks, Extreme Learning Machines, Gaussian Progress Regression, Random Forests, Support Vector Regression, and Multivariate Adaptive Regression Splines (MARS). These techniques are used to construct surrogate models for data generated using the 47 optimization challenge functions from the Virtual Library of Simulation Experiments (Surjanovic & Bingham, 2013). The sampling methods studied are Sobol sequence sampling, Halton sequence sampling, and Latin Hypercube sampling. Four performance measures are used to evaluate the accuracy of the surrogate models: root mean squared error (RMSE), maximum percent error (MPE), the R-squared value, and the adjusted R-squared value. The surrogate models’ ability to locate the extrema of the functions are evaluated by calculating the distance between the extreme point(s) estimated by the model and the actual function extrema. The results provide guidance on which surrogate models generate the best predictions and give general “rules of thumb.”

Using information extracted from the surrogate modeling comparison experiments and building upon previous meta-learning approaches, we constructed a tool to provide recommendations for the appropriate modeling techniques for the datasets based only on the characteristics of the data being modeled. Characteristics, i.e., attributes, were calculated for each dataset with the goal of representing its overall behavior. Attributes were calculated based only on input and output values in the dataset. The attributes that have the strongest relationships with the performance metrics are determined using feature reduction methods, including the ReliefF algorithm (Kira & Rendell, 1992) and principal component analysis (Hotelling, 1933). These attributes were used as inputs, with designated performance metrics as outputs, to train models to make predictions on the performance of the surrogate modeling techniques. The performance metrics used as outputs for training the recommendation tool are the adjusted R-squared value for the design space approximation application and the normalized distance between the extreme point(s) estimated by the models and the actual extrema of the true model for surrogate-based optimization. The adjusted R-squared value takes into account both the surrogate model accuracy and its complexity (Miles, 2005). The tool identifies which surrogate modeling techniques are recommended for use for either approximating a design space or for surrogate-based optimization given a set of data.

References:

Bhosekar, A., & Ierapetritou, M. (2018). Advances in surrogate based modeling, feasibility analysis, and optimization: A review. Computers & Chemical Engineering, 108, 250-267.

Cui, C., et al. (2016). A recommendation system for meta-modeling: A meta-learning based approach. Expert Systems with Applications, 46, 33-44.

Davis, S., et al. (2017). Efficient Surrogate Model Development: Optimum Model Form Based on Input Function Characteristics. In A. Espuna, M. Graells & L. Puigjaner (Eds.), 27th European Symposium on Computer Aided Process Engineering (ESCAPE 27) (Vol. 40, pp. 457-462). Barcelona, Spain: Elsevier.

Du, D., et al. (2016). Statistical Metamodeling and Sequential Design of Computer Experiments to Model Glyco-Altered Gating of Sodium Channels in Cardiac Myocytes. IEEE J Biomed Health Inform, 20, 1439-1452.

Garud, S. S., et al. (2018). LEAPS2: Learning based Evolutionary Assistive Paradigm for Surrogate Selection. Computers & Chemical Engineering, 119, 352-370.

Han, Z., & Zhang, K. (2012). Surrogate-Based Optimization. In O. Roeva (Ed.), Real-World Applications of Genetic Algorithms (pp. 343-362). Rijeka, Croatia: InTech Open.

Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24, 498-520.

Ju, Y. P., et al. (2016). Artificial intelligence metamodel comparison and application to wind turbine airfoil uncertainty analysis. Advances in Mechanical Engineering, 8.

Kira, K., & Rendell, L. A. (1992). A Practical Approach to Feature-Selection. Machine Learning /, 249-256.

Luo, J. N., & Lu, W. X. (2014). Comparison of surrogate models with different methods in groundwater remediation process. Journal of Earth System Science, 123, 1579-1589.

Miles, J. (2005). R Squared, Adjusted R Squared. In Encyclopedia of Statistics in Behavioral Science: John Wiley & Sons Ltd.

Surjanovic, S., & Bingham, D. (2013). Virtual Library of Simulation Experiments. In (Vol. 2018). Simon Fraser University.

Williams, B., & Cremaschi, S. (2019). Surrogate Model Selection for Design-Space Approximation and Surrogate-Based Optimization. In S. Munoz, C. Laird & M. Realff (Eds.), Ninth International Conference on Foundations of Computer-Aided Process Design (FOCAPD-19) (pp. 353-358). Copper Mountain, CO, USA: Elsevier B.V.