Talking the Talk Does Not Entail Walking the Walk: On the Limits of Large Language Models in Lexical Entailment Recognition
This work addresses a key challenge in natural language understanding for AI researchers, though it is incremental as it benchmarks existing models without introducing new methods.
The study evaluated eight Large Language Models on lexical entailment recognition for verbs using WordNet and HyperLex datasets, finding they achieved moderately good performance with improvements from few-shot prompting but failed to perfectly solve the task.
Verbs form the backbone of language, providing the structure and meaning to sentences. Yet, their intricate semantic nuances pose a longstanding challenge. Understanding verb relations through the concept of lexical entailment is crucial for comprehending sentence meanings and grasping verb dynamics. This work investigates the capabilities of eight Large Language Models in recognizing lexical entailment relations among verbs through differently devised prompting strategies and zero-/few-shot settings over verb pairs from two lexical databases, namely WordNet and HyperLex. Our findings unveil that the models can tackle the lexical entailment recognition task with moderately good performance, although at varying degree of effectiveness and under different conditions. Also, utilizing few-shot prompting can enhance the models' performance. However, perfectly solving the task arises as an unmet challenge for all examined LLMs, which raises an emergence for further research developments on this topic.