CL AISep 25, 2022

WinoDict: Probing language models for in-context word acquisition

Julian Martin Eisenschlos, Jeremy R. Cole, Fangyu Liu, William W. Cohen

DeepMind

arXiv:2209.12153v123.6274 citationsh-index: 91

Originality Incremental advance

AI Analysis

This addresses the issue of diachronic degradation in LLMs, which are frozen in time and unable to adapt to language changes, providing a benchmark for future improvements in in-context learning, though it is incremental as it builds on existing Winograd tasks.

The authors tackled the problem of measuring large language models' ability to learn novel words during inference by introducing WinoDict, a benchmark that rewrites Winograd-style co-reference resolution tasks with synthetic words, and found that model accuracy decreases radically compared to original tasks.

We introduce a new in-context learning paradigm to measure Large Language Models' (LLMs) ability to learn novel words during inference. In particular, we rewrite Winograd-style co-reference resolution problems by replacing the key concept word with a synthetic but plausible word that the model must understand to complete the task. Solving this task requires the model to make use of the dictionary definition of the new word given in the prompt. This benchmark addresses word acquisition, one important aspect of the diachronic degradation known to afflict LLMs. As LLMs are frozen in time at the moment they are trained, they are normally unable to reflect the way language changes over time. We show that the accuracy of LLMs compared to the original Winograd tasks decreases radically in our benchmark, thus identifying a limitation of current models and providing a benchmark to measure future improvements in LLMs ability to do in-context learning.

View on arXiv PDF

Similar