LGDBDec 13, 2021

What can Data-Centric AI Learn from Data and ML Engineering?

arXiv:2112.06439v156 citations
Originality Synthesis-oriented
AI Analysis

It addresses the challenge of managing data-centric applications for organizations, but it is incremental as it builds on existing engineering practices.

The paper discusses applying lessons from data and ML engineering to data-centric AI, based on experience building platforms for thousands of applications, aiming to improve data quality and system management.

Data-centric AI is a new and exciting research topic in the AI community, but many organizations already build and maintain various "data-centric" applications whose goal is to produce high quality data. These range from traditional business data processing applications (e.g., "how much should we charge each of our customers this month?") to production ML systems such as recommendation engines. The fields of data and ML engineering have arisen in recent years to manage these applications, and both include many interesting novel tools and processes. In this paper, we discuss several lessons from data and ML engineering that could be interesting to apply in data-centric AI, based on our experience building data and ML platforms that serve thousands of applications at a range of organizations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes