CVMay 26

CIRCLED: A Multi-turn CIR Dataset with Consistent Dialogues across Domains

Tomohisa Takeda, Yu-Chieh Lin, Yuji Nozawa, Youyang Ng, Osamu Torii, Yusuke Matsui

arXiv:2605.2673449.2Has Code

AI Analysis

For researchers in multi-modal retrieval, this provides a larger, more diverse benchmark for multi-turn composed image retrieval, though it is an incremental dataset contribution.

The authors constructed CIRCLED, a multi-turn composed image retrieval dataset with 22,608 sessions across nine domains, addressing the lack of dialogue-history consistency and domain diversity in prior datasets. The dataset exceeds Multi-turn FashionIQ in scale and generality.

Existing Multi-Turn Composed Image Retrieval (MTCIR) datasets lack dialogue-history consistency and are restricted to the fashion domain. To address these limitations, we construct CIRCLED by extending FashionIQ, CIRR, and CIRCO. In CIRCLED, the query at each turn progressively approaches the target image. Data are generated via a CIReVL-based retrieval pipeline and curated with multiple filters on retrieval success, turn length, consistency, and information redundancy to ensure quality. In total, we collect 22,608 multi-turn sessions across nine subsets, substantially exceeding Multi-turn FashionIQ (11,505 sessions) in both scale and generality. We further apply multiple baseline methods and quantitatively assess retrieval accuracy on CIRCLED. Our work provides a practical, high-quality benchmark to facilitate future research on multi-turn CIR. The dataset and code are publicly available at https://huggingface.co/datasets/tk1441/CIRCLED and https://github.com/mti-lab/circled.

View on arXiv PDF Code

Similar