Testing the Ability of Language Models to Interpret Figurative Language
This addresses the under-studied challenge of figurative language understanding in NLP, which is crucial for improving communication and cognition in AI systems, though it is incremental as it builds on existing evaluation frameworks.
The paper tackled the problem of evaluating language models' ability to interpret figurative language by introducing Fig-QA, a Winograd-style task, and found that while models perform above chance, they fall short of human performance, especially in zero- or few-shot settings.
Figurative and metaphorical language are commonplace in discourse, and figurative expressions play an important role in communication and cognition. However, figurative language has been a relatively under-studied area in NLP, and it remains an open question to what extent modern language models can interpret nonliteral phrases. To address this question, we introduce Fig-QA, a Winograd-style nonliteral language understanding task consisting of correctly interpreting paired figurative phrases with divergent meanings. We evaluate the performance of several state-of-the-art language models on this task, and find that although language models achieve performance significantly over chance, they still fall short of human performance, particularly in zero- or few-shot settings. This suggests that further work is needed to improve the nonliteral reasoning capabilities of language models.