CLApr 12

When Meaning Isn't Literal: Exploring Idiomatic Meaning Across Languages and Modalities

arXiv:2604.1078746.12 citationsh-index: 6
AI Analysis

For NLP and multimodal AI researchers, this work provides a benchmark and method to address the blind spot of idiomatic reasoning in language models, which is crucial for culturally grounded AI.

The paper introduces Mediom, a multilingual multimodal idiom corpus with 3,533 idioms from Hindi, Bengali, and Thai, and proposes HIDE, a hinting-based framework for idiom explanation. Benchmarks reveal systematic failures in current LLMs and VLMs for figurative understanding, with HIDE improving reasoning through error-feedback retrieval.

Idiomatic reasoning, deeply intertwined with metaphor and culture, remains a blind spot for contemporary language models, whose progress skews toward surface-level lexical and semantic cues. For instance, the Bengali idiom \textit{\foreignlanguage{bengali}{\char"0986\char"0999\char"09CD\char"0997\char"09C1 \char"09B0 \char"09AB\char"09B2 \char"099F\char"0995}} (angur fol tok, ``grapes are sour''): it encodes denial-driven rationalization, yet naive models latch onto the literal fox-and-grape imagery. Addressing this oversight, we present ``Mediom,'' a multilingual, multimodal idiom corpus of 3,533 Hindi, Bengali, and Thai idioms, each paired with gold-standard explanations, cross-lingual translations, and carefully aligned text--image representations. We benchmark both large language models (textual reasoning) and vision-language models (figurative disambiguation) on Mediom, exposing systematic failures in metaphor comprehension. To mitigate these gaps, we propose ``HIDE,'' a Hinting-based Idiom Explanation framework that leverages error-feedback retrieval and targeted diagnostic cues for iterative reasoning refinement. Collectively, Mediom and HIDE establish a rigorous test bed and methodology for culturally grounded, multimodal idiom understanding embedded with reasoning hints in next-generation AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes