Generating Continuations in Multilingual Idiomatic Contexts
This work addresses the challenge of evaluating generative language models' understanding of nuanced, non-compositional figurative language, but it is incremental as it shows minimal improvements and no new methods.
The study tackled the problem of generating narrative continuations for idiomatic versus literal expressions in English and Portuguese, finding that models performed only slightly better on literal contexts with very small margins and showed similar performance across both languages.
The ability to process idiomatic or literal multiword expressions is a crucial aspect of understanding and generating any language. The task of generating contextually relevant continuations for narratives containing idiomatic (or literal) expressions can allow us to test the ability of generative language models (LMs) in understanding nuanced language containing non-compositional figurative text. We conduct a series of experiments using datasets in two distinct languages (English and Portuguese) under three different training settings (zero-shot, few-shot, and fine-tuned). Our results suggest that the models are only slightly better at generating continuations for literal contexts than idiomatic contexts, with exceedingly small margins. Furthermore, the models studied in this work perform equally well across both languages, indicating the robustness of generative models in performing this task.