CLOct 19, 2016

Cross-Lingual Syntactic Transfer with Limited Resources

arXiv:1610.06227v246 citations
Originality Incremental advance
AI Analysis

This addresses the problem of building parsers for low-resource languages with incremental improvements over prior methods.

The paper tackles cross-lingual syntactic transfer for dependency parsers with limited translation data, achieving state-of-the-art improvements in several languages using only the Bible as a translation corpus, and shows additional gains with Europarl data.

We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. The method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets from the Universal Dependencies corpora.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes