LGMay 1, 2021

Exploring Opportunistic Meta-knowledge to Reduce Search Spaces for Automated Machine Learning

arXiv:2105.00282v16 citations
Originality Incremental advance
AI Analysis

This work addresses efficiency and accuracy challenges in AutoML for practitioners by incrementally optimizing pipeline search through meta-knowledge, but it is an incremental improvement rather than a paradigm shift.

The paper tackles the problem of reducing the search space for automated machine learning (AutoML) pipelines by preemptively culling poorly performing classifiers/regressors based on opportunistic meta-knowledge from previous evaluations, showing that this approach can improve ML outcomes, though results indicate that culling should not be too severe to avoid over-reliance on a single top performer.

Machine learning (ML) pipeline composition and optimisation have been studied to seek multi-stage ML models, i.e. preprocessor-inclusive, that are both valid and well-performing. These processes typically require the design and traversal of complex configuration spaces consisting of not just individual ML components and their hyperparameters, but also higher-level pipeline structures that link these components together. Optimisation efficiency and resulting ML-model accuracy both suffer if this pipeline search space is unwieldy and excessively large; it becomes an appealing notion to avoid costly evaluations of poorly performing ML components ahead of time. Accordingly, this paper investigates whether, based on previous experience, a pool of available classifiers/regressors can be preemptively culled ahead of initiating a pipeline composition/optimisation process for a new ML problem, i.e. dataset. The previous experience comes in the form of classifier/regressor accuracy rankings derived, with loose assumptions, from a substantial but non-exhaustive number of pipeline evaluations; this meta-knowledge is considered 'opportunistic'. Numerous experiments with the AutoWeka4MCPS package, including ones leveraging similarities between datasets via the relative landmarking method, show that, despite its seeming unreliability, opportunistic meta-knowledge can improve ML outcomes. However, results also indicate that the culling of classifiers/regressors should not be too severe either. In effect, it is better to search through a 'top tier' of recommended predictors than to pin hopes onto one previously supreme performer.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes