What do you learn from context? Probing for sentence structure in contextualized word representations
This work addresses the need for interpretability in NLP by probing how pre-trained models capture linguistic structure, though it is incremental as it builds on existing probing methods.
The authors tackled the problem of understanding what sentence structure information is encoded in contextualized word representations like ELMo and BERT, finding that these models excel at syntactic tasks but show only minor improvements on semantic tasks compared to non-contextual baselines.
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT (Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of downstream NLP tasks. Building on recent token-level probing work, we introduce a novel edge probing task design and construct a broad suite of sub-sentence tasks derived from the traditional structured NLP pipeline. We probe word-level contextual representations from four recent models and investigate how they encode sentence structure across a range of syntactic, semantic, local, and long-range phenomena. We find that existing models trained on language modeling and translation produce strong representations for syntactic phenomena, but only offer comparably small improvements on semantic tasks over a non-contextual baseline.