CVOct 16, 2025

Free-Grained Hierarchical Recognition

arXiv:2510.14737v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses a practical challenge in computer vision for real-world applications where annotation granularity varies, though it is incremental in adapting existing techniques to a new benchmark.

The paper tackled the problem of hierarchical image classification with incomplete, mixed-granularity annotations by introducing the ImageNet-F benchmark and free-grain learning methods, achieving substantial performance improvements under realistic supervision constraints.

Hierarchical image classification predicts labels across a semantic taxonomy, but existing methods typically assume complete, fine-grained annotations, an assumption rarely met in practice. Real-world supervision varies in granularity, influenced by image quality, annotator expertise, and task demands; a distant bird may be labeled Bird, while a close-up reveals Bald eagle. We introduce ImageNet-F, a large-scale benchmark curated from ImageNet and structured into cognitively inspired basic, subordinate, and fine-grained levels. Using CLIP as a proxy for semantic ambiguity, we simulate realistic, mixed-granularity labels reflecting human annotation behavior. We propose free-grain learning, with heterogeneous supervision across instances. We develop methods that enhance semantic guidance via pseudo-attributes from vision-language models and visual guidance via semi-supervised learning. These, along with strong baselines, substantially improve performance under mixed supervision. Together, our benchmark and methods advance hierarchical classification under real-world constraints.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes