A Tale of a Probe and a Parser
This work addresses the problem of accurately measuring syntactic information in neural models for NLP researchers, but it is incremental as it builds on existing probing and parsing techniques.
The study compared a novel structural probe for extracting syntactic information from neural language models against a traditional parser with identical lightweight parameterization, finding that the parser outperformed the probe on UUAS in seven of nine languages (e.g., by 11.1 points in English), but the probe performed better under a second less common metric.
Measuring what linguistic information is encoded in neural models of language has become popular in NLP. Researchers approach this enterprise by training "probes" - supervised models designed to extract linguistic structure from another model's output. One such probe is the structural probe (Hewitt and Manning, 2019), designed to quantify the extent to which syntactic information is encoded in contextualised word representations. The structural probe has a novel design, unattested in the parsing literature, the precise benefit of which is not immediately obvious. To explore whether syntactic probes would do better to make use of existing techniques, we compare the structural probe to a more traditional parser with an identical lightweight parameterisation. The parser outperforms structural probe on UUAS in seven of nine analysed languages, often by a substantial amount (e.g. by 11.1 points in English). Under a second less common metric, however, there is the opposite trend - the structural probe outperforms the parser. This begs the question: which metric should we prefer?