AI CL IR MAMar 24, 2025

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

Sky CH-Wang, Darshan Deshpande, Smaranda Muresan, Anand Kannappan, Rebecca Qian

arXiv:2503.19193v18 citationsh-index: 36ACL

Originality Synthesis-oriented

AI Analysis

This addresses a challenging use case for general AI assistants, though it is incremental as it primarily introduces a new benchmark.

The authors tackled the problem of tip-of-the-tongue search and reasoning for AI assistants by introducing the BLUR benchmark, which includes 573 real-world questions where humans score 98% but the best system scores only 56%.

We introduce Browsing Lost Unformed Recollections, a tip-of-the-tongue known-item search and reasoning benchmark for general AI assistants. BLUR introduces a set of 573 real-world validated questions that demand searching and reasoning across multi-modal and multilingual inputs, as well as proficient tool use, in order to excel on. Humans easily ace these questions (scoring on average 98%), while the best-performing system scores around 56%. To facilitate progress toward addressing this challenging and aspirational use case for general AI assistants, we release 350 questions through a public leaderboard, retain the answers to 250 of them, and have the rest as a private test set.

View on arXiv PDF

Similar