CLSep 25, 2021

Systematic Generalization on gSCAN: What is Nearly Solved and What is Next?

Linlu Qiu, Hexiang Hu, Bowen Zhang, Peter Shaw, Fei Sha

arXiv:2109.12243v130.9666 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses systematic generalization in grounded language understanding for AI researchers, but it is incremental as it builds on existing benchmarks and methods.

The authors analyzed the gSCAN benchmark for systematic generalization in grounded language understanding, finding that a general-purpose Transformer model outperforms specialized approaches on most tasks, but errors reveal fundamental challenges in linguistic generalization. They also proposed new tasks involving object relations and identified data inefficiency as a future challenge.

We analyze the grounded SCAN (gSCAN) benchmark, which was recently proposed to study systematic generalization for grounded language understanding. First, we study which aspects of the original benchmark can be solved by commonly used methods in multi-modal research. We find that a general-purpose Transformer-based model with cross-modal attention achieves strong performance on a majority of the gSCAN splits, surprisingly outperforming more specialized approaches from prior work. Furthermore, our analysis suggests that many of the remaining errors reveal the same fundamental challenge in systematic generalization of linguistic constructs regardless of visual context. Second, inspired by this finding, we propose challenging new tasks for gSCAN by generating data to incorporate relations between objects in the visual environment. Finally, we find that current models are surprisingly data inefficient given the narrow scope of commands in gSCAN, suggesting another challenge for future work.

View on arXiv PDF Code

Similar