Revisiting the Effects of Leakage on Dependency Parsing
This work addresses methodological issues in dependency parsing evaluation, particularly for cross-lingual applications, but is incremental as it builds on prior findings.
The study revisited the claim that leakage between training and test graphs explains dependency parsing performance variation, finding it only holds for zero-shot cross-lingual settings, and proposed a more fine-grained measure that correlates with performance variation.
Recent work by Søgaard (2020) showed that, treebank size aside, overlap between training and test graphs (termed leakage) explains more of the observed variation in dependency parsing performance than other explanations. In this work we revisit this claim, testing it on more models and languages. We find that it only holds for zero-shot cross-lingual settings. We then propose a more fine-grained measure of such leakage which, unlike the original measure, not only explains but also correlates with observed performance variation. Code and data are available here: https://github.com/miriamwanner/reu-nlp-project