CLApr 30, 2020

Perturbed Masking: Parameter-free Probing for Analyzing and Interpreting BERT

arXiv:2004.14786v31035 citations
AI Analysis

This addresses the issue of probe uncertainty in analyzing language models for researchers in NLP, though it is incremental as it builds on existing probing techniques.

The authors tackled the problem of evaluating linguistic knowledge in pre-trained language models like BERT by proposing a parameter-free probing method that avoids introducing additional parameters or direct supervision, and they found that syntactic trees recovered from BERT using this method significantly outperformed baselines and improved downstream sentiment classification.

By introducing a small set of additional parameters, a probe learns to solve specific linguistic tasks (e.g., dependency parsing) in a supervised manner using feature representations (e.g., contextualized embeddings). The effectiveness of such probing tasks is taken as evidence that the pre-trained model encodes linguistic knowledge. However, this approach of evaluating a language model is undermined by the uncertainty of the amount of knowledge that is learned by the probe itself. Complementary to those works, we propose a parameter-free probing technique for analyzing pre-trained language models (e.g., BERT). Our method does not require direct supervision from the probing tasks, nor do we introduce additional parameters to the probing process. Our experiments on BERT show that syntactic trees recovered from BERT using our method are significantly better than linguistically-uninformed baselines. We further feed the empirically induced dependency structures into a downstream sentiment classification task and find its improvement compatible with or even superior to a human-designed dependency schema.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes