CLOct 19, 2016

Cross-Lingual Syntactic Transfer with Limited Resources

Mohammad Sadegh Rasooli, Michael Collins

arXiv:1610.06227v214.546 citationsHas Code

Originality Incremental advance

AI Analysis

This addresses the problem of building parsers for low-resource languages with incremental improvements over prior methods.

The paper tackles cross-lingual syntactic transfer for dependency parsers with limited translation data, achieving state-of-the-art improvements in several languages using only the Bible as a translation corpus, and shows additional gains with Europarl data.

We describe a simple but effective method for cross-lingual syntactic transfer of dependency parsers, in the scenario where a large amount of translation data is not available. The method makes use of three steps: 1) a method for deriving cross-lingual word clusters, which can then be used in a multilingual parser; 2) a method for transferring lexical information from a target language to source language treebanks; 3) a method for integrating these steps with the density-driven annotation projection method of Rasooli and Collins (2015). Experiments show improvements over the state-of-the-art in several languages used in previous work, in a setting where the only source of translation data is the Bible, a considerably smaller corpus than the Europarl corpus used in previous work. Results using the Europarl corpus as a source of translation data show additional improvements over the results of Rasooli and Collins (2015). We conclude with results on 38 datasets from the Universal Dependencies corpora.

View on arXiv PDF Code

Similar