Layered TPOT: Speeding up Tree-based Pipeline Optimization
This is an incremental improvement for users of automated machine learning tools, speeding up pipeline optimization.
The paper tackled the problem of slow pipeline optimization in AutoML by introducing Layered TPOT, which uses a modified evolutionary algorithm to evaluate pipelines on increasing data subsets, resulting in faster model discovery on large datasets.
With the demand for machine learning increasing, so does the demand for tools which make it easier to use. Automated machine learning (AutoML) tools have been developed to address this need, such as the Tree-Based Pipeline Optimization Tool (TPOT) which uses genetic programming to build optimal pipelines. We introduce Layered TPOT, a modification to TPOT which aims to create pipelines equally good as the original, but in significantly less time. This approach evaluates candidate pipelines on increasingly large subsets of the data according to their fitness, using a modified evolutionary algorithm to allow for separate competition between pipelines trained on different sample sizes. Empirical evaluation shows that, on sufficiently large datasets, Layered TPOT indeed finds better models faster.