DBLGJan 19, 2020

SQLFlow: A Bridge between SQL and Machine Learning

arXiv:2001.06846v112 citations
AI Analysis

This addresses the problem for developers in industrial AI systems by providing a unified interface to streamline ML workflow development, though it is incremental as it builds on existing SQL and ML technologies.

The authors tackled the challenge of efficiently developing end-to-end machine learning workflows by introducing SQLFlow, a system that bridges SQL and various ML engines, enabling developers to write concise SQL programs for tasks like training, prediction, and data processing. The result is a tool that compiles SQL into Kubernetes-native workflows, with adoption by industrial users such as Ant Financial and Alibaba Group.

Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data processing and feature extraction in addition to ML. SQLFlow compiles a SQL program into a Kubernetes-native workflow for fault-tolerable execution and on-cloud deployment. Current industrial users include Ant Financial, DiDi, and Alibaba Group.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes