2015 AIChE Spring Meeting and 11th Global Congress on Process Safety

(149c) Alamo: Automatic Learning of Algebraic Models Using Optimization

We address the problem of discovering algebraic relationships that are hidden in a set of data, an experimental process, or a simulation model.  The problem lies at the interfaces between statistical experimental design, optimization, and machine learning.  We present a methodology for developing models that are simple and accurate, while minimizing the number of experiments or simulations of the system under study.  The methodology begins by building a low-complexity model of the system using integer optimization techniques.  The model is then tested, exploited, and improved through the use of derivative-free optimization to adaptively sample new experimental or simulation points.  Semi-infinite optimization techniques facilitate a combined data- and theory-driven approach to model building.  We provide computational comparisons between ALAMO, the computational implementation of the proposed methodology, and a variety of machine learning and statistical techniques, including Latin hypercube sampling, simple least squares regression, and the lasso.  Finally, we demonstrate how ALAMO’s adaptive sampling technique can be used to learn models by selecting small numbers of samples from huge data sets or even from infinitely many data points.