Maximum Spanning Trees Are Invariant to Temperature Scaling in Graph-based Dependency Parsing
This is an incremental result for researchers in NLP and dependency parsing, showing a limitation of a common calibration method.
The paper proves that temperature scaling, a post-hoc calibration technique for neural networks, does not alter the output of graph-based dependency parsers that use maximum spanning trees, indicating it cannot improve parsing accuracy.
Modern graph-based syntactic dependency parsers operate by predicting, for each token within a sentence, a probability distribution over its possible syntactic heads (i.e., all other tokens) and then extracting a maximum spanning tree from the resulting log-probabilities. Nowadays, virtually all such parsers utilize deep neural networks and may thus be susceptible to miscalibration (in particular, overconfident predictions). In this paper, we prove that temperature scaling, a popular technique for post-hoc calibration of neural networks, cannot change the output of the aforementioned procedure. We conclude that other techniques are needed to tackle miscalibration in graph-based dependency parsers in a way that improves parsing accuracy.