CVLGOct 13, 2020

Making Every Label Count: Handling Semantic Imprecision by Integrating Domain Knowledge

arXiv:2010.06469v16 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of utilizing weakly labeled data for machine learning, which is incremental as it extends existing noise-handling methods to include imprecision.

The paper tackles the problem of label imprecision in noisy datasets, where labels are correct but not sufficiently detailed, and proposes CHILLAX, a hierarchical classification method that outperforms baselines by up to 16.4 percentage points and state-of-the-art by up to 3.9 percentage points.

Noisy data, crawled from the web or supplied by volunteers such as Mechanical Turkers or citizen scientists, is considered an alternative to professionally labeled data. There has been research focused on mitigating the effects of label noise. It is typically modeled as inaccuracy, where the correct label is replaced by an incorrect label from the same set. We consider an additional dimension of label noise: imprecision. For example, a non-breeding snow bunting is labeled as a bird. This label is correct, but not as precise as the task requires. Standard softmax classifiers cannot learn from such a weak label because they consider all classes mutually exclusive, which non-breeding snow bunting and bird are not. We propose CHILLAX (Class Hierarchies for Imprecise Label Learning and Annotation eXtrapolation), a method based on hierarchical classification, to fully utilize labels of any precision. Experiments on noisy variants of NABirds and ILSVRC2012 show that our method outperforms strong baselines by as much as 16.4 percentage points, and the current state of the art by up to 3.9 percentage points.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes