LGNECTMLOct 6, 2021

Data-Centric AI Requires Rethinking Data Notion

arXiv:2110.02491v417 citations
AI Analysis

This foundational work addresses a theoretical problem for AI researchers and practitioners by rethinking data notions, but it is incremental as it builds on existing mathematical concepts.

The paper tackles the need for a unified definition of data in data-centric AI by proposing categorical and cochain notions as unifying principles, which could impact the development and use of machine learning packages.

The transition towards data-centric AI requires revisiting data notions from mathematical and implementational standpoints to obtain unified data-centric machine learning packages. Towards this end, this work proposes unifying principles offered by categorical and cochain notions of data, and discusses the importance of these principles in data-centric AI transition. In the categorical notion, data is viewed as a mathematical structure that we act upon via morphisms to preserve this structure. As for cochain notion, data can be viewed as a function defined in a discrete domain of interest and acted upon via operators. While these notions are almost orthogonal, they provide a unifying definition to view data, ultimately impacting the way machine learning packages are developed, implemented, and utilized by practitioners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes