LGAIDCSPNov 21, 2023

DMLR: Data-centric Machine Learning Research -- Past, Present and Future

MIT
arXiv:2311.13028v218 citationsh-index: 74
Originality Synthesis-oriented
AI Analysis

It addresses the need for better datasets to improve ML research, but is incremental as it builds on existing workshop discussions.

The paper outlines the importance of community engagement and infrastructure development for creating next-generation public datasets to advance machine learning science, proposing a collective path forward for their creation and maintenance.

Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science. We chart a path forward as a collective effort to sustain the creation and maintenance of these datasets and methods towards positive scientific, societal and business impact.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes