CLMay 4, 2020

What is Learned in Visually Grounded Neural Syntax Acquisition

arXiv:2005.01678v21006 citations
AI Analysis

This is an incremental analysis that challenges assumptions about visual grounding in syntax acquisition for NLP researchers.

The paper analyzed the Visually Grounded Neural Syntax Learner to identify what it actually learns from visual signals, finding that simplified versions with less expressiveness perform similarly or better, and that noun concreteness, not complex syntax, drives predictions.

Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model's strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model's predictions as opposed to more complex syntactic reasoning.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes