Designing Machine Learning Pipeline Toolkit for AutoML Surrogate Modeling Optimization
This work addresses the complexity of managing and optimizing machine learning pipelines for AutoML users, though it appears incremental as it builds on existing surrogate modeling and toolkit concepts.
The authors tackled the pipeline optimization problem in machine learning by developing the AutoMLPipeline (AMLP) toolkit, which uses simple expressions to create and evaluate complex pipelines and a two-stage surrogate modeling approach, resulting in outperforming other AutoML methods with a 4-hour time budget in under 5 minutes of computation.
The pipeline optimization problem in machine learning requires simultaneous optimization of pipeline structures and parameter adaptation of their elements. Having an elegant way to express these structures can help lessen the complexity in the management and analysis of their performances together with the different choices of optimization strategies. With these issues in mind, we created the AutoMLPipeline (AMLP) toolkit which facilitates the creation and evaluation of complex machine learning pipeline structures using simple expressions. We use AMLP to find optimal pipeline signatures, datamine them, and use these datamined features to speed-up learning and prediction. We formulated a two-stage pipeline optimization with surrogate modeling in AMLP which outperforms other AutoML approaches with a 4-hour time budget in less than 5 minutes of AMLP computation time.