CLAIMay 24, 2023

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds

arXiv:2305.14785v28 citations
Originality Synthesis-oriented
AI Analysis

This work identifies critical blind spots in LLMs' language understanding, which is a problem for applications relying on robust inference, though it is incremental as it builds on existing evaluation efforts.

The study evaluated large language models (LLMs) on simple linguistic inference tasks, such as entailments and monotonicity, and found they exhibited moderate to low performance, with certain syntactic structures further confusing the models and causing prediction errors.

We evaluate LLMs' language understanding capacities on simple inference tasks that most humans find trivial. Specifically, we target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. We design evaluation sets for these tasks and conduct experiments in both zero-shot and chain-of-thought setups, and with multiple prompts and LLMs. The models exhibit moderate to low performance on these evaluation sets. Subsequent experiments show that embedding the premise in syntactic constructions that should preserve the entailment relations (presupposition triggers) or change them (non-factives), further confuses the models, causing them to either under-predict or over-predict certain entailment labels regardless of the true relation, and often disregarding the nature of the embedding context. Overall these results suggest that, despite LLMs' celebrated language understanding capacity, even the strongest models have blindspots with respect to certain types of entailments, and certain information-packaging structures act as ``blinds'' overshadowing the semantics of the embedded premise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes