CLAIMay 31, 2023

What does the Failure to Reason with "Respectively" in Zero/Few-Shot Settings Tell Us about Language Models?

arXiv:2305.19597v1223 citations
Originality Synthesis-oriented
AI Analysis

This addresses a specific linguistic reasoning gap in language models, highlighting limitations in handling complex coordinate structures, which is incremental as it builds on existing NLI benchmarks.

The study investigated how language models handle sentences with 'respectively' in natural language inference, finding that fine-tuned models struggle without explicit supervision, requiring longer training for implicit cases and failing to generalize across constructions.

Humans can effortlessly understand the coordinate structure of sentences such as "Niels Bohr and Kurt Cobain were born in Copenhagen and Seattle, respectively". In the context of natural language inference (NLI), we examine how language models (LMs) reason with respective readings (Gawron and Kehler, 2004) from two perspectives: syntactic-semantic and commonsense-world knowledge. We propose a controlled synthetic dataset WikiResNLI and a naturally occurring dataset NatResNLI to encompass various explicit and implicit realizations of "respectively". We show that fine-tuned NLI models struggle with understanding such readings without explicit supervision. While few-shot learning is easy in the presence of explicit cues, longer training is required when the reading is evoked implicitly, leaving models to rely on common sense inferences. Furthermore, our fine-grained analysis indicates models fail to generalize across different constructions. To conclude, we demonstrate that LMs still lag behind humans in generalizing to the long tail of linguistic constructions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes