IRCLMar 17, 2024

ConvSDG: Session Data Generation for Conversational Search

arXiv:2403.11335v115 citationsh-index: 20WWW
Originality Incremental advance
AI Analysis

This work addresses a data bottleneck for conversational search systems, offering an incremental improvement through data generation.

The paper tackles the scarcity of training data for conversational dense retrieval methods by proposing ConvSDG, a framework that uses large language models to generate conversational session data, which improves search performance on four datasets.

Conversational search provides a more convenient interface for users to search by allowing multi-turn interaction with the search engine. However, the effectiveness of the conversational dense retrieval methods is limited by the scarcity of training data required for their fine-tuning. Thus, generating more training conversational sessions with relevant labels could potentially improve search performance. Based on the promising capabilities of large language models (LLMs) on text generation, we propose ConvSDG, a simple yet effective framework to explore the feasibility of boosting conversational search by using LLM for session data generation. Within this framework, we design dialogue/session-level and query-level data generation with unsupervised and semi-supervised learning, according to the availability of relevance judgments. The generated data are used to fine-tune the conversational dense retriever. Extensive experiments on four widely used datasets demonstrate the effectiveness and broad applicability of our ConvSDG framework compared with several strong baselines.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes