DCLGJun 27, 2023

DataCI: A Platform for Data-Centric AI on Streaming Data

arXiv:2306.15538v2h-index: 13Has Code
Originality Synthesis-oriented
AI Analysis

This platform addresses the problem of managing and evaluating data-centric AI pipelines for researchers and practitioners working with streaming data, representing an incremental tool development rather than a fundamental breakthrough.

The authors tackled the challenge of implementing data-centric AI in dynamic streaming data environments by developing DataCI, an open-source platform that provides infrastructure, versioning control, and a graphical interface for managing streaming datasets and pipelines. Preliminary studies demonstrated the platform's ease of use and effectiveness, though no concrete performance numbers were provided.

We introduce DataCI, a comprehensive open-source platform designed specifically for data-centric AI in dynamic streaming data settings. DataCI provides 1) an infrastructure with rich APIs for seamless streaming dataset management, data-centric pipeline development and evaluation on streaming scenarios, 2) an carefully designed versioning control function to track the pipeline lineage, and 3) an intuitive graphical interface for a better interactive user experience. Preliminary studies and demonstrations attest to the easy-to-use and effectiveness of DataCI, highlighting its potential to revolutionize the practice of data-centric AI in streaming data contexts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes