PL LG SEMay 24, 2019

Type-Driven Automated Learning with Lale

Martin Hirzel, Kiran Kate, Avraham Shinnar, Subhrajit Roy, Parikshit Ram

arXiv:1906.03957v16.66 citations

Originality Incremental advance

AI Analysis

This addresses the issue of fragmented automation tools for data scientists, though it is incremental in improving existing methods.

The paper tackles the problem of inconsistent syntax and lack of portability in machine-learning automation tools by proposing Lale, an embedded language that uses types for correctness checks and automated search, extending automation across data modalities and programming languages.

Machine-learning automation tools, ranging from humble grid-search to hyperopt, auto-sklearn, and TPOT, help explore large search spaces of possible pipelines. Unfortunately, each of these tools has a different syntax for specifying its search space, leading to lack of portability, missed relevant points, and spurious points that are inconsistent with error checks and documentation of the searchable base components. This paper proposes using types (such as enum, float, or dictionary) both for checking the correctness of, and for automatically searching over, hyperparameters and pipeline configurations. Using types for both of these purposes guarantees consistency. We present Lale, an embedded language that resembles scikit learn but provides better automation, correctness checks, and portability. Lale extends the reach of existing automation tools across data modalities (tables, text, images, time-series) and programming languages (Python, Java, R). Thus, data scientists can leverage automation while remaining in control of their work.

View on arXiv PDF

Similar