LGMar 25

Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers

arXiv:2603.2427548.5h-index: 16Has Code
Predicted impact top 47% in LG · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses incremental improvements in clustering performance for computer vision applications by enhancing text-guided image clustering.

The paper tackles the problem of weak inter-class discriminability and limited text modality utilization in Language-Assisted Image Clustering by proposing a framework that uses cross-modal relations and adaptive semantic centers, achieving an average improvement of 2.6% over state-of-the-art methods on eight benchmark datasets.

Language-Assisted Image Clustering (LAIC) augments the input images with additional texts with the help of vision-language models (VLMs) to promote clustering performance. Despite recent progress, existing LAIC methods often overlook two issues: (i) textual features constructed for each image are highly similar, leading to weak inter-class discriminability; (ii) the clustering step is restricted to pre-built image-text alignments, limiting the potential for better utilization of the text modality. To address these issues, we propose a new LAIC framework with two complementary components. First, we exploit cross-modal relations to produce more discriminative self-supervision signals for clustering, as it compatible with most VLMs training mechanisms. Second, we learn category-wise continuous semantic centers via prompt learning to produce the final clustering assignments. Extensive experiments on eight benchmark datasets demonstrate that our method achieves an average improvement of 2.6% over state-of-the-art methods, and the learned semantic centers exhibit strong interpretability. Code is available in the supplementary material.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes