CVLGDec 21, 2019

Measuring Dataset Granularity

arXiv:1912.10154v113 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the lack of a precise definition for dataset granularity in machine learning, which is incremental as it builds on clustering theory to formalize the concept.

The authors tackled the problem of defining and measuring dataset granularity by proposing an axiomatic framework and evaluating it on hierarchical datasets, finding that some datasets considered fine-grained contain coarse-grained subsets and that fine-grained datasets are more challenging for learning, transfer, few-shot learning, and adversarial attacks.

Despite the increasing visibility of fine-grained recognition in our field, "fine-grained'' has thus far lacked a precise definition. In this work, building upon clustering theory, we pursue a framework for measuring dataset granularity. We argue that dataset granularity should depend not only on the data samples and their labels, but also on the distance function we choose. We propose an axiomatic framework to capture desired properties for a dataset granularity measure and provide examples of measures that satisfy these properties. We assess each measure via experiments on datasets with hierarchical labels of varying granularity. When measuring granularity in commonly used datasets with our measure, we find that certain datasets that are widely considered fine-grained in fact contain subsets of considerable size that are substantially more coarse-grained than datasets generally regarded as coarse-grained. We also investigate the interplay between dataset granularity with a variety of factors and find that fine-grained datasets are more difficult to learn from, more difficult to transfer to, more difficult to perform few-shot learning with, and more vulnerable to adversarial attacks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes