Identifying Incorrect Annotations in Multi-Label Classification Data
This addresses data quality issues for practitioners using multi-label classification, such as in image or document tagging, but is incremental as it builds on existing frameworks.
The paper tackles the problem of identifying mislabeled examples in multi-label classification datasets by extending the Confident Learning framework and proposing a label quality score, resulting in empirical outperformance over other algorithms and the discovery of many label errors in the CelebA dataset.
In multi-label classification, each example in a dataset may be annotated as belonging to one or more classes (or none of the classes). Example applications include image (or document) tagging where each possible tag either applies to a particular image (or document) or not. With many possible classes to consider, data annotators are likely to make errors when labeling such data in practice. Here we consider algorithms for finding mislabeled examples in multi-label classification datasets. We propose an extension of the Confident Learning framework to this setting, as well as a label quality score that ranks examples with label errors much higher than those which are correctly labeled. Both approaches can utilize any trained classifier. After demonstrating that our methodology empirically outperforms other algorithms for label error detection, we apply our approach to discover many label errors in the CelebA image tagging dataset.