LG CVDec 3, 2024

Class-wise Autoencoders Measure Classification Difficulty And Detect Label Mistakes

Jacob Marks, Brent A. Griffin, Jason J. Corso

arXiv:2412.02596v12.6h-index: 12Has Code

Originality Highly original

AI Analysis

This provides a tool for dataset analysis and label error detection, particularly useful for researchers and practitioners in computer vision, though it is incremental as it builds on autoencoder methods.

The paper tackles the problem of analyzing classification datasets by introducing a framework using reconstruction error ratios from class-wise autoencoders, which measures classification difficulty and decomposes it into finite sample size and Bayes error components; it shows strong correlation with SOTA model error rates across 19 visual datasets and achieves SOTA performance in mislabel detection under label noise.

We introduce a new framework for analyzing classification datasets based on the ratios of reconstruction errors between autoencoders trained on individual classes. This analysis framework enables efficient characterization of datasets on the sample, class, and entire dataset levels. We define reconstruction error ratios (RERs) that probe classification difficulty and allow its decomposition into (1) finite sample size and (2) Bayes error and decision-boundary complexity. Through systematic study across 19 popular visual datasets, we find that our RER-based dataset difficulty probe strongly correlates with error rate for state-of-the-art (SOTA) classification models. By interpreting sample-level classification difficulty as a label mistakenness score, we further find that RERs achieve SOTA performance on mislabel detection tasks on hard datasets under symmetric and asymmetric label noise. Our code is publicly available at https://github.com/voxel51/reconstruction-error-ratios.

View on arXiv PDF Code

Similar