LGCVJun 10, 2019

From Data Quality to Model Quality: an Exploratory Study on Deep Learning

arXiv:1906.11882v130 citations
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of data quality's impact on model performance for deep learning practitioners, but it is incremental as it explores known factors without introducing new methods.

The study investigated how four data quality aspects—Dataset Equilibrium, Dataset Size, Quality of Label, and Dataset Contamination—affect deep learning model accuracy on MNIST and Cifar-10, finding that decreases in these aspects reduce model accuracy.

Nowadays, people strive to improve the accuracy of deep learning models. However, very little work has focused on the quality of data sets. In fact, data quality determines model quality. Therefore, it is important for us to make research on how data quality affects on model quality. In this paper, we mainly consider four aspects of data quality, including Dataset Equilibrium, Dataset Size, Quality of Label, Dataset Contamination. We deign experiment on MNIST and Cifar-10 and try to find out the influence the four aspects make on model quality. Experimental results show that four aspects all have decisive impact on the quality of models. It means that decrease in data quality in these aspects will reduce the accuracy of model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes