IR AI CL LGFeb 17, 2025

Multi-Turn Multi-Modal Question Clarification for Enhanced Conversational Understanding

Kimia Ramezan, Alireza Amiri Bavandpour, Yifei Yuan, Clemencia Siro, Mohammad Aliannejadi

arXiv:2502.11442v110.35 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses the limitation of text-only or single-turn methods in capturing complex user preferences, particularly visual attributes, for conversational search systems, representing an incremental advancement in multi-modal dialogue.

The paper tackles the problem of refining user search queries through multi-turn, multi-modal conversational clarification, introducing the MMCQ task and a dataset (ClariMM) with over 13k interactions, and shows that their retrieval framework (Mario) improves MRR by 12.88% compared to uni-modal and single-turn approaches.

Conversational query clarification enables users to refine their search queries through interactive dialogue, improving search effectiveness. Traditional approaches rely on text-based clarifying questions, which often fail to capture complex user preferences, particularly those involving visual attributes. While recent work has explored single-turn multi-modal clarification with images alongside text, such methods do not fully support the progressive nature of user intent refinement over multiple turns. Motivated by this, we introduce the Multi-turn Multi-modal Clarifying Questions (MMCQ) task, which combines text and visual modalities to refine user queries in a multi-turn conversation. To facilitate this task, we create a large-scale dataset named ClariMM comprising over 13k multi-turn interactions and 33k question-answer pairs containing multi-modal clarifying questions. We propose Mario, a retrieval framework that employs a two-phase ranking strategy: initial retrieval with BM25, followed by a multi-modal generative re-ranking model that integrates textual and visual information from conversational history. Our experiments show that multi-turn multi-modal clarification outperforms uni-modal and single-turn approaches, improving MRR by 12.88%. The gains are most significant in longer interactions, demonstrating the value of progressive refinement for complex queries.

View on arXiv PDF

Similar