IRAIAug 30, 2024

Understanding the User: An Intent-Based Ranking Dataset

arXiv:2408.17103v13 citationsh-index: 10
Originality Synthesis-oriented
AI Analysis

This work addresses a specific challenge in information retrieval for researchers and practitioners by providing an incremental dataset enhancement.

The paper tackles the problem of web search datasets lacking query intent descriptions by augmenting TREC-DL-21 and TREC-DL-22 with LLM-generated descriptions, validated through crowdsourcing to create an evaluation set for ranking and query rewriting tasks.

As information retrieval systems continue to evolve, accurate evaluation and benchmarking of these systems become pivotal. Web search datasets, such as MS MARCO, primarily provide short keyword queries without accompanying intent or descriptions, posing a challenge in comprehending the underlying information need. This paper proposes an approach to augmenting such datasets to annotate informative query descriptions, with a focus on two prominent benchmark datasets: TREC-DL-21 and TREC-DL-22. Our methodology involves utilizing state-of-the-art LLMs to analyze and comprehend the implicit intent within individual queries from benchmark datasets. By extracting key semantic elements, we construct detailed and contextually rich descriptions for these queries. To validate the generated query descriptions, we employ crowdsourcing as a reliable means of obtaining diverse human perspectives on the accuracy and informativeness of the descriptions. This information can be used as an evaluation set for tasks such as ranking, query rewriting, or others.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes