PLLGSEMay 24, 2019

Type-Driven Automated Learning with Lale

arXiv:1906.03957v16 citations
Originality Incremental advance
AI Analysis

This addresses the issue of fragmented automation tools for data scientists, though it is incremental in improving existing methods.

The paper tackles the problem of inconsistent syntax and lack of portability in machine-learning automation tools by proposing Lale, an embedded language that uses types for correctness checks and automated search, extending automation across data modalities and programming languages.

Machine-learning automation tools, ranging from humble grid-search to hyperopt, auto-sklearn, and TPOT, help explore large search spaces of possible pipelines. Unfortunately, each of these tools has a different syntax for specifying its search space, leading to lack of portability, missed relevant points, and spurious points that are inconsistent with error checks and documentation of the searchable base components. This paper proposes using types (such as enum, float, or dictionary) both for checking the correctness of, and for automatically searching over, hyperparameters and pipeline configurations. Using types for both of these purposes guarantees consistency. We present Lale, an embedded language that resembles scikit learn but provides better automation, correctness checks, and portability. Lale extends the reach of existing automation tools across data modalities (tables, text, images, time-series) and programming languages (Python, Java, R). Thus, data scientists can leverage automation while remaining in control of their work.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes