IdioLink: Retrieving Meaning Beyond Words Across Idiomatic and Literal Expressions
For NLP researchers, this benchmark exposes a key gap in idiom-aware semantic retrieval, providing a challenging testbed for future models.
IdioLink is a retrieval benchmark testing whether models can link idiomatic expressions to conceptually equivalent literal or paraphrased meanings. Current models (e.g., BGE, E5) struggle, relying on topical cues instead of deep semantics.
Idioms pose a fundamental challenge for language models, as their meaning cannot be inferred from surface form alone. Understanding such expressions, therefore, requires semantic abstraction beyond lexical overlap. We introduce IdioLink, a retrieval benchmark designed to test whether models can link idiomatic expressions to conceptually equivalent meanings expressed in literal or paraphrased forms. IdioLink comprises 10,700 documents and 2,140 queries, spanning 107 idioms with both literal and figurative uses. Each document and query is annotated with spans that convey the core meaning. Evaluating strong embedding baselines (e.g., BGE, E5, Contriever, and Qwen), we show that current models struggle to retrieve equivalent meanings across divergent surface realizations, relying instead on topical and shallow semantic cues. IdioLink exposes key gaps in idiom-aware semantic retrieval and provides a challenging testbed for future models.