CLAILGMar 11, 2020

A Benchmark for Systematic Generalization in Grounded Language Understanding

arXiv:2003.05161v2162 citations
AI Analysis

This addresses the challenge of systematic generalization in AI for language understanding, though it is incremental as it builds on related benchmarks.

The paper tackles the problem of neural networks' inability to interpret novel compositions in grounded language understanding by introducing a new benchmark, gSCAN, which evaluates compositional generalization in a grid world setting, finding that state-of-the-art models fail dramatically in most cases.

Humans easily interpret expressions that describe unfamiliar situations composed from familiar parts ("greet the pink brontosaurus by the ferris wheel"). Modern neural networks, by contrast, struggle to interpret novel compositions. In this paper, we introduce a new benchmark, gSCAN, for evaluating compositional generalization in situated language understanding. Going beyond a related benchmark that focused on syntactic aspects of generalization, gSCAN defines a language grounded in the states of a grid world, facilitating novel evaluations of acquiring linguistically motivated rules. For example, agents must understand how adjectives such as 'small' are interpreted relative to the current world state or how adverbs such as 'cautiously' combine with new verbs. We test a strong multi-modal baseline model and a state-of-the-art compositional method finding that, in most cases, they fail dramatically when generalization requires systematic compositional rules.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes