CVAILGNov 9, 2023

Are "Hierarchical" Visual Representations Hierarchical?

UW
arXiv:2311.05784v2h-index: 14Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of evaluating hierarchical representations in computer vision for researchers, showing they are incremental in not outperforming standard methods on hierarchy capture.

The paper investigates whether hierarchical visual representations capture human-perceived hierarchy better than standard learned representations, finding through evaluation on 12 datasets that they do not, but they can improve search efficiency and interpretability.

Learned visual representations often capture large amounts of semantic information for accurate downstream applications. Human understanding of the world is fundamentally grounded in hierarchy. To mimic this and further improve representation capabilities, the community has explored "hierarchical" visual representations that aim at modeling the underlying hierarchy of the visual world. In this work, we set out to investigate if hierarchical visual representations truly capture the human perceived hierarchy better than standard learned representations. To this end, we create HierNet, a suite of 12 datasets spanning 3 kinds of hierarchy from the BREEDs subset of ImageNet. After extensive evaluation of Hyperbolic and Matryoshka Representations across training setups, we conclude that they do not capture hierarchy any better than the standard representations but can assist in other aspects like search efficiency and interpretability. Our benchmark and the datasets are open-sourced at https://github.com/ethanlshen/HierNet.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes