CLNov 13, 2019

What do you mean, BERT? Assessing BERT as a Distributional Semantics Model

arXiv:1911.05758v249 citations
Originality Synthesis-oriented
AI Analysis

This work identifies limitations in BERT's semantic modeling for NLP researchers, highlighting issues that could impact downstream tasks, but it is incremental as it builds on existing critiques of contextual embeddings.

The paper assessed BERT's contextualized embeddings as a distributional semantics model, finding that while they show some semantic coherence, they are disrupted by meaningless positional effects from sentence structure, affecting similarity relationships.

Contextualized word embeddings, i.e. vector representations for words in context, are naturally seen as an extension of previous noncontextual distributional semantic models. In this work, we focus on BERT, a deep neural network that produces contextualized embeddings and has set the state-of-the-art in several semantic tasks, and study the semantic coherence of its embedding space. While showing a tendency towards coherence, BERT does not fully live up to the natural expectations for a semantic vector space. In particular, we find that the position of the sentence in which a word occurs, while having no meaning correlates, leaves a noticeable trace on the word embeddings and disturbs similarity relationships.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes