IR CLDec 17, 2024

AIR-Bench: Automated Heterogeneous Information Retrieval Benchmark

Jianlyu Chen, Nan Wang, Chaofan Li, Bo Wang, Shitao Xiao, Han Xiao, Hao Liao, Defu Lian, Zheng Liu

arXiv:2412.13102v416.016 citationsh-index: 23Has CodeACL

Originality Incremental advance

AI Analysis

This provides a cost-effective and efficient evaluation tool for information retrieval models in emerging domains, though it is incremental as it builds on existing benchmark concepts with automation.

The paper tackles the limitations of current information retrieval benchmarks by proposing AIR-Bench, an automated benchmark that uses large language models to generate diverse testing data without human intervention, and shows it aligns well with human-labeled data.

Evaluation plays a crucial role in the advancement of information retrieval (IR) models. However, current benchmarks, which are based on predefined domains and human-labeled data, face limitations in addressing evaluation needs for emerging domains both cost-effectively and efficiently. To address this challenge, we propose the Automated Heterogeneous Information Retrieval Benchmark (AIR-Bench). AIR-Bench is distinguished by three key features: 1) Automated. The testing data in AIR-Bench is automatically generated by large language models (LLMs) without human intervention. 2) Heterogeneous. The testing data in AIR-Bench is generated with respect to diverse tasks, domains and languages. 3) Dynamic. The domains and languages covered by AIR-Bench are constantly augmented to provide an increasingly comprehensive evaluation benchmark for community developers. We develop a reliable and robust data generation pipeline to automatically create diverse and high-quality evaluation datasets based on real-world corpora. Our findings demonstrate that the generated testing data in AIR-Bench aligns well with human-labeled testing data, making AIR-Bench a dependable benchmark for evaluating IR models. The resources in AIR-Bench are publicly available at https://github.com/AIR-Bench/AIR-Bench.

View on arXiv PDF Code

Similar