The Principles of Data-Centric AI (DCAI)
This foundational work addresses data quality issues in AI deployments for researchers and practitioners, though it is incremental as it synthesizes existing perspectives into principles.
The paper tackles the problem of AI systems being overly model-centric by introducing data-centric AI (DCAI) as an emerging concept that prioritizes data quality and dynamism, outlining six guiding principles to shift focus toward data for improved performance in real-world applications.
Data is a crucial infrastructure to how artificial intelligence (AI) systems learn. However, these systems to date have been largely model-centric, putting a premium on the model at the expense of the data quality. Data quality issues beset the performance of AI systems, particularly in downstream deployments and in real-world applications. Data-centric AI (DCAI) as an emerging concept brings data, its quality and its dynamism to the forefront in considerations of AI systems through an iterative and systematic approach. As one of the first overviews, this article brings together data-centric perspectives and concepts to outline the foundations of DCAI. It specifically formulates six guiding principles for researchers and practitioners and gives direction for future advancement of DCAI.