CLLGApr 29, 2020

Do Neural Language Models Show Preferences for Syntactic Formalisms?

arXiv:2004.14096v11012 citations
AI Analysis

This addresses the problem of limited scope in interpretability studies for researchers in computational linguistics by examining cross-linguistic and formalism-specific syntactic encoding, though it is incremental as it builds on existing probing methods.

The study investigated whether neural language models like BERT and ELMo show preferences for deep syntactic (Universal Dependencies) versus surface-syntactic (Surface-Syntactic Universal Dependencies) formalisms across 13 languages, finding that both models exhibit a preference for UD with variations across languages and layers, and this preference correlates with differences in tree shape.

Recent work on the interpretability of deep neural language models has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers - and that the strength of this preference is correlated with differences in tree shape.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes