Quantifying the Contextualization of Word Representations with Semantic Class Probing
This work addresses the need for interpretability in NLP by providing insights into model behavior, but it is incremental as it builds on existing probing methods without introducing new paradigms.
The paper tackles the problem of understanding how pretrained language models like BERT contextualize words by quantifying the ability to infer semantic classes from embeddings, finding that lower layers show the strongest contextualization effects and that finetuning makes top layers more task-specific while preserving pretrained knowledge.
Pretrained language models have achieved a new state of the art on many NLP tasks, but there are still many open questions about how and why they work so well. We investigate the contextualization of words in BERT. We quantify the amount of contextualization, i.e., how well words are interpreted in context, by studying the extent to which semantic classes of a word can be inferred from its contextualized embeddings. Quantifying contextualization helps in understanding and utilizing pretrained language models. We show that top layer representations achieve high accuracy inferring semantic classes; that the strongest contextualization effects occur in the lower layers; that local context is mostly sufficient for semantic class inference; and that top layer representations are more task-specific after finetuning while lower layer representations are more transferable. Finetuning uncovers task related features, but pretrained knowledge is still largely preserved.