IRMay 11

MIRA: An LLM-Assisted Benchmark for Multi-Category Integrated Retrieval

Mehmet Deniz Türkmen, Suchana Datta, Dwaipayan Roy, Daniel Hienert, Philipp Mayr, Derek Greene

arXiv:2605.1125439.4

AI Analysis

For IR researchers, it addresses the lack of benchmarks for unified retrieval across heterogeneous data sources, though it is domain-specific to scholarly search.

MIRA introduces a benchmark for multi-category integrated retrieval across four scholarly categories, built from real user queries and using LLMs to reduce annotation cost. It provides a testbed for category-aware ranking evaluation.

Users increasingly expect modern search systems to offer a unified interface that seamlessly retrieves information from diverse data sources and formats. However, current information retrieval (IR) evaluation benchmarks have not kept pace with this development, primarily due to the lack of test collections that represent the diversity of contemporary search domains. We address this critical gap with MIRA, a novel benchmark based on a large-scale social science search platform. MIRA is designed for category-aware ranking across heterogeneous categories - Publications, Research Data, Variables, and Instruments & Tools - within a single, unified evaluation framework. The proposed collection is distinctive in several ways: (1) it is built upon real user queries, providing a more realistic basis for evaluation; (2) it covers scholarly items from four distinct categories, enabling multi-faceted evaluation; and (3) it leverages a Large Language Model to generate topic descriptions and narratives, as well as for relevance assessment with respect to these topics, substantially reducing the labor and cost of test collection generation. We release this resource to benefit the community by providing a foundational testbed for the research on multi-faceted, category-aware, integrated, or cross-category information retrieval.

View on arXiv PDF

Similar