CVMay 31, 2025

Long-Tailed Visual Recognition via Permutation-Invariant Head-to-Tail Feature Fusion

arXiv:2506.00625v12 citationsh-index: 12
Originality Incremental advance
AI Analysis

This addresses the problem of low accuracy in tail classes for computer vision tasks, offering an incremental improvement through a plug-and-play method.

The paper tackles the challenge of long-tailed visual recognition, where models favor head classes over tail classes due to imbalanced data, by proposing PI-H2T, which improves representation space and classifier bias, achieving enhanced performance on benchmarks.

The imbalanced distribution of long-tailed data presents a significant challenge for deep learning models, causing them to prioritize head classes while neglecting tail classes. Two key factors contributing to low recognition accuracy are the deformed representation space and a biased classifier, stemming from insufficient semantic information in tail classes. To address these issues, we propose permutation-invariant and head-to-tail feature fusion (PI-H2T), a highly adaptable method. PI-H2T enhances the representation space through permutation-invariant representation fusion (PIF), yielding more clustered features and automatic class margins. Additionally, it adjusts the biased classifier by transferring semantic information from head to tail classes via head-to-tail fusion (H2TF), improving tail class diversity. Theoretical analysis and experiments show that PI-H2T optimizes both the representation space and decision boundaries. Its plug-and-play design ensures seamless integration into existing methods, providing a straightforward path to further performance improvements. Extensive experiments on long-tailed benchmarks confirm the effectiveness of PI-H2T.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes