DBLGSEJul 29, 2019

sql4ml A declarative end-to-end workflow for machine learning

arXiv:1907.12415v27 citations
Originality Synthesis-oriented
AI Analysis

This addresses the problem of workflow fragmentation for data scientists and engineers, though it is incremental as it builds on existing SQL and ML frameworks.

The authors tackled the fragmented workflow between relational databases and machine learning frameworks by developing sql4ml, a system that allows users to express both feature engineering and ML algorithms in SQL, automatically translating to TensorFlow for training, resulting in usability benefits and experimental validation on three well-known algorithms.

We present sql4ml, a system for expressing supervised machine learning (ML) models in SQL and automatically training them in TensorFlow. The primary motivation for this work stems from the observation that in many data science tasks there is a back-and-forth between a relational database that stores the data and a machine learning framework. Data preprocessing and feature engineering typically happen in a database, whereas learning is usually executed in separate ML libraries. This fragmented workflow requires from the users to juggle between different programming paradigms and software systems. With sql4ml the user can express both feature engineering and ML algorithms in SQL, while the system translates this code to an appropriate representation for training inside a machine learning framework. We describe our translation method, present experimental results from applying it on three well-known ML algorithms and discuss the usability benefits from concentrating the entire workflow on the database side.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes