CLSep 20, 2021

Dependency Induction Through the Lens of Visual Perception

arXiv:2109.09790v1662 citations
Originality Highly original
AI Analysis

This work addresses the problem of limited signal in text-only grammar induction for NLP researchers, offering a novel multimodal approach that is incremental but provides strong specific gains.

The paper tackles unsupervised grammar induction by leveraging word concreteness and visual information to jointly learn constituency and dependency grammars, resulting in a 50% improvement in direct attachment score for dependency parsing and outperforming state-of-the-art visually grounded models in constituency parsing with a smaller grammar size.

Most previous work on grammar induction focuses on learning phrasal or dependency structure purely from text. However, because the signal provided by text alone is limited, recently introduced visually grounded syntax models make use of multimodal information leading to improved performance in constituency grammar induction. However, as compared to dependency grammars, constituency grammars do not provide a straightforward way to incorporate visual information without enforcing language-specific heuristics. In this paper, we propose an unsupervised grammar induction model that leverages word concreteness and a structural vision-based heuristic to jointly learn constituency-structure and dependency-structure grammars. Our experiments find that concreteness is a strong indicator for learning dependency grammars, improving the direct attachment score (DAS) by over 50\% as compared to state-of-the-art models trained on pure text. Next, we propose an extension of our model that leverages both word concreteness and visual semantic role labels in constituency and dependency parsing. Our experiments show that the proposed extension outperforms the current state-of-the-art visually grounded models in constituency parsing even with a smaller grammar size.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes