CVJun 14, 2025

Interpretable Text-Guided Image Clustering via Iterative Search

arXiv:2506.12514v2h-index: 6
Originality Highly original
AI Analysis

This addresses the problem of ambiguous clustering criteria for users in computer vision, offering a more interpretable and controllable method.

The paper tackles the ambiguity in image clustering by introducing a text-guided approach that allows users to specify criteria via natural language, resulting in superior performance across various benchmarks.

Traditional clustering methods aim to group unlabeled data points based on their similarity to each other. However, clustering, in the absence of additional information, is an ill-posed problem as there may be many different, yet equally valid, ways to partition a dataset. Distinct users may want to use different criteria to form clusters in the same data, e.g. shape v.s. color. Recently introduced text-guided image clustering methods aim to address this ambiguity by allowing users to specify the criteria of interest using natural language instructions. This instruction provides the necessary context and control needed to obtain clusters that are more aligned with the users' intent. We propose a new text-guided clustering approach named ITGC that uses an iterative discovery process, guided by an unsupervised clustering objective, to generate interpretable visual concepts that better capture the criteria expressed in a user's instructions. We report superior performance compared to existing methods across a wide variety of image clustering and fine-grained classification benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes