CVFeb 13, 2025

Prior-Constrained Association Learning for Fine-Grained Generalized Category Discovery

arXiv:2502.09501v15 citationsh-index: 7AAAI

Originality Highly original

AI Analysis

This work addresses a challenging problem in semi-supervised learning, providing a solution for discovering novel categories in unlabeled data, which is significant for researchers and practitioners in the field of machine learning.

The authors tackled the problem of generalized category discovery, achieving significant improvements over existing methods by incorporating prior knowledge into the association process and utilizing both parametric and non-parametric classification. The proposed method outperforms existing methods by a significant margin on multiple benchmarks.

This paper addresses generalized category discovery (GCD), the task of clustering unlabeled data from potentially known or unknown categories with the help of labeled instances from each known category. Compared to traditional semi-supervised learning, GCD is more challenging because unlabeled data could be from novel categories not appearing in labeled data. Current state-of-the-art methods typically learn a parametric classifier assisted by self-distillation. While being effective, these methods do not make use of cross-instance similarity to discover class-specific semantics which are essential for representation learning and category discovery. In this paper, we revisit the association-based paradigm and propose a Prior-constrained Association Learning method to capture and learn the semantic relations within data. In particular, the labeled data from known categories provides a unique prior for the association of unlabeled data. Unlike previous methods that only adopts the prior as a pre or post-clustering refinement, we fully incorporate the prior into the association process, and let it constrain the association towards a reliable grouping outcome. The estimated semantic groups are utilized through non-parametric prototypical contrast to enhance the representation learning. A further combination of both parametric and non-parametric classification complements each other and leads to a model that outperforms existing methods by a significant margin. On multiple GCD benchmarks, we perform extensive experiments and validate the effectiveness of our proposed method.

View on arXiv PDF

Similar