ReaSCAN: Compositional Reasoning in Language Grounding
This work addresses the need for better benchmarks to evaluate compositional generalization in language understanding models, though it is incremental as it builds on prior datasets.
The authors tackled the problem of compositional language grounding by identifying limitations in the existing gSCAN dataset and proposing ReaSCAN, a new benchmark that requires compositional interpretation and reasoning, which they showed is substantially harder for neural models.
The ability to compositionally map language to referents, relations, and actions is an essential component of language understanding. The recent gSCAN dataset (Ruis et al. 2020, NeurIPS) is an inspiring attempt to assess the capacity of models to learn this kind of grounding in scenarios involving navigational instructions. However, we show that gSCAN's highly constrained design means that it does not require compositional interpretation and that many details of its instructions and scenarios are not required for task success. To address these limitations, we propose ReaSCAN, a benchmark dataset that builds off gSCAN but requires compositional language interpretation and reasoning about entities and relations. We assess two models on ReaSCAN: a multi-modal baseline and a state-of-the-art graph convolutional neural model. These experiments show that ReaSCAN is substantially harder than gSCAN for both neural architectures. This suggests that ReaSCAN can serve as a valuable benchmark for advancing our understanding of models' compositional generalization and reasoning capabilities.