LGMLSep 29, 2020

Think before you act: A simple baseline for compositional generalization

arXiv:2009.13962v216 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of compositional generalization for neural networks in language understanding, though it is incremental as it builds on existing methods and only partially solves the benchmark.

The authors tackled the problem of compositional generalization in grounded language understanding by proposing a simple attention-based model with an auxiliary loss that separates object identification from navigation, achieving surprisingly good performance on two specific test splits of the gSCAN benchmark while leaving other tasks unsolved.

Contrarily to humans who have the ability to recombine familiar expressions to create novel ones, modern neural networks struggle to do so. This has been emphasized recently with the introduction of the benchmark dataset "gSCAN" (Ruis et al. 2020), aiming to evaluate models' performance at compositional generalization in grounded language understanding. In this work, we challenge the gSCAN benchmark by proposing a simple model that achieves surprisingly good performance on two of the gSCAN test splits. Our model is based on the observation that, to succeed on gSCAN tasks, the agent must (i) identify the target object (think) before (ii) navigating to it successfully (act). Concretely, we propose an attention-inspired modification of the baseline model from (Ruis et al. 2020), together with an auxiliary loss, that takes into account the sequential nature of steps (i) and (ii). While two compositional tasks are trivially solved with our approach, we also find that the other tasks remain unsolved, validating the relevance of gSCAN as a benchmark for evaluating models' compositional abilities.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes