CLAILGOct 16, 2021

Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information

arXiv:2110.08420v3341 citations
Originality Incremental advance
AI Analysis

This work addresses the need for interpretable dataset difficulty assessment in machine learning, particularly for NLP, though it is incremental as it builds on existing V-usable information concepts.

The paper tackles the problem of understanding dataset difficulty by framing it as the lack of V-usable information, introducing pointwise V-information (PVI) to measure instance-level difficulty and enabling comparisons across datasets and instances for a given model, with results including the discovery of annotation artefacts in NLP benchmarks.

Estimating the difficulty of a dataset typically involves comparing state-of-the-art models to humans; the bigger the performance gap, the harder the dataset is said to be. However, this comparison provides little understanding of how difficult each instance in a given distribution is, or what attributes make the dataset difficult for a given model. To address these questions, we frame dataset difficulty -- w.r.t. a model $\mathcal{V}$ -- as the lack of $\mathcal{V}$-$\textit{usable information}$ (Xu et al., 2019), where a lower value indicates a more difficult dataset for $\mathcal{V}$. We further introduce $\textit{pointwise $\mathcal{V}$-information}$ (PVI) for measuring the difficulty of individual instances w.r.t. a given distribution. While standard evaluation metrics typically only compare different models for the same dataset, $\mathcal{V}$-$\textit{usable information}$ and PVI also permit the converse: for a given model $\mathcal{V}$, we can compare different datasets, as well as different instances/slices of the same dataset. Furthermore, our framework allows for the interpretability of different input attributes via transformations of the input, which we use to discover annotation artefacts in widely-used NLP benchmarks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes