LGMar 25

Language-Assisted Image Clustering Guided by Discriminative Relational Signals and Adaptive Semantic Centers

Jun Ma, Xu Zhang, Zhengxing Jiao, Yaxin Hou, Hui Liu, Junhui Hou, Yuheng Jia

arXiv:2603.2427548.5h-index: 16Has Code

Predicted impact top 47% in LG · last 90 daysOriginality Incremental advance

AI Analysis

This work addresses incremental improvements in clustering performance for computer vision applications by enhancing text-guided image clustering.

The paper tackles the problem of weak inter-class discriminability and limited text modality utilization in Language-Assisted Image Clustering by proposing a framework that uses cross-modal relations and adaptive semantic centers, achieving an average improvement of 2.6% over state-of-the-art methods on eight benchmark datasets.

Language-Assisted Image Clustering (LAIC) augments the input images with additional texts with the help of vision-language models (VLMs) to promote clustering performance. Despite recent progress, existing LAIC methods often overlook two issues: (i) textual features constructed for each image are highly similar, leading to weak inter-class discriminability; (ii) the clustering step is restricted to pre-built image-text alignments, limiting the potential for better utilization of the text modality. To address these issues, we propose a new LAIC framework with two complementary components. First, we exploit cross-modal relations to produce more discriminative self-supervision signals for clustering, as it compatible with most VLMs training mechanisms. Second, we learn category-wise continuous semantic centers via prompt learning to produce the final clustering assignments. Extensive experiments on eight benchmark datasets demonstrate that our method achieves an average improvement of 2.6% over state-of-the-art methods, and the learned semantic centers exhibit strong interpretability. Code is available in the supplementary material.

View on arXiv PDF

Similar