DCLGAug 27, 2019

A Framework for Model Search Across Multiple Machine Learning Implementations

arXiv:1908.10310v13 citations
AI Analysis

This addresses efficiency challenges for data scientists performing model searches, though it is an incremental improvement over existing automation tools.

The authors tackled the problem of time-consuming model searches across multiple machine learning implementations by developing a framework with unified coding and profile-based scheduling, which reduced implementation code to 55-144 lines and achieved competitive speed on benchmark datasets.

Several recently devised machine learning (ML) algorithms have shown improved accuracy for various predictive problems. Model searches, which explore to find an optimal ML algorithm and hyperparameter values for the target problem, play a critical role in such improvements. During a model search, data scientists typically use multiple ML implementations to construct several predictive models; however, it takes significant time and effort to employ multiple ML implementations due to the need to learn how to use them, prepare input data in several different formats, and compare their outputs. Our proposed framework addresses these issues by providing simple and unified coding method. It has been designed with the following two attractive features: i) new machine learning implementations can be added easily via common interfaces between the framework and ML implementations and ii) it can be scaled to handle large model configuration search spaces via profile-based scheduling. The results of our evaluation indicate that, with our framework, implementers need only write 55-144 lines of code to add a new ML implementation. They also show that ours was the fastest framework for the HIGGS dataset, and the second-fastest for the SECOM dataset.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes