The ALAMO approach to machine learning
This work addresses the need for interpretable and constrained machine learning models in domains like chemical engineering, though it appears incremental as it builds on existing regression and sampling techniques.
The paper tackles the problem of learning algebraic functions from data by introducing ALAMO, a computational methodology that builds low-complexity linear models with non-linear transformations and refines them adaptively through error maximization sampling, demonstrating its ability to generate simple and accurate models for reaction problems and showing improved sampling efficiency and better performance on validation data with constraints.
ALAMO is a computational methodology for leaning algebraic functions from data. Given a data set, the approach begins by building a low-complexity, linear model composed of explicit non-linear transformations of the independent variables. Linear combinations of these non-linear transformations allow a linear model to better approximate complex behavior observed in real processes. The model is refined, as additional data are obtained in an adaptive fashion through error maximization sampling using derivative-free optimization. Models built using ALAMO can enforce constraints on the response variables to incorporate first-principles knowledge. The ability of ALAMO to generate simple and accurate models for a number of reaction problems is demonstrated. The error maximization sampling is compared with Latin hypercube designs to demonstrate its sampling efficiency. ALAMO's constrained regression methodology is used to further refine concentration models, resulting in models that perform better on validation data and satisfy upper and lower bounds placed on model outputs.