AIOct 7, 2021

A Data-Centric Approach for Training Deep Neural Networks with Less Data

Mohammad Motamedi, Nikolay Sakharnykh, Tim Kaldewey

arXiv:2110.03613v221.971 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of data scarcity for AI practitioners, offering a data-centric solution that is incremental in nature.

The paper tackles the problem of training deep neural networks with limited data by enhancing data quality and generating new samples, achieving a 5% accuracy improvement with a smaller dataset.

While the availability of large datasets is perceived to be a key requirement for training deep neural networks, it is possible to train such models with relatively little data. However, compensating for the absence of large datasets demands a series of actions to enhance the quality of the existing samples and to generate new ones. This paper summarizes our winning submission to the "Data-Centric AI" competition. We discuss some of the challenges that arise while training with a small dataset, offer a principled approach for systematic data quality enhancement, and propose a GAN-based solution for synthesizing new data points. Our evaluations indicate that the dataset generated by the proposed pipeline offers 5% accuracy improvement while being significantly smaller than the baseline.

View on arXiv PDF

Similar