CLFeb 25, 2022

On the data requirements of probing

arXiv:2202.12801v1638 citations
Originality Incremental advance
AI Analysis

This work provides a systematic framework for constructing probing datasets, aiding researchers in diagnosing neural NLP models more efficiently.

The paper addresses the lack of quantitative methods for estimating dataset sizes in probing studies of neural language models, presenting a novel method to determine sufficient sample sizes for distinguishing probing configurations, which is verified across case studies with statistical power.

As large and powerful neural language models are developed, researchers have been increasingly interested in developing diagnostic tools to probe them. There are many papers with conclusions of the form "observation X is found in model Y", using their own datasets with varying sizes. Larger probing datasets bring more reliability, but are also expensive to collect. There is yet to be a quantitative method for estimating reasonable probing dataset sizes. We tackle this omission in the context of comparing two probing configurations: after we have collected a small dataset from a pilot study, how many additional data samples are sufficient to distinguish two different configurations? We present a novel method to estimate the required number of data samples in such experiments and, across several case studies, we verify that our estimations have sufficient statistical power. Our framework helps to systematically construct probing datasets to diagnose neural NLP models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes