CVJun 14, 2025

Generalized Category Discovery under the Long-Tailed Distribution

arXiv:2506.12515v22 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses a practical problem for real-world applications where data is long-tailed, but it is incremental as it adapts existing GCD methods to this distribution.

The paper tackles Generalized Category Discovery (GCD) under long-tailed distributions by addressing challenges in classifier learning and category number estimation, proposing a framework with confident sample selection and density-based clustering that shows effectiveness on both long-tailed and conventional datasets.

This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes