Cross-Lingual Dependency Parsing Using Code-Mixed TreeBank
This work addresses cross-lingual parsing for languages with limited resources, but it is incremental as it builds on existing treebank translation methods.
The paper tackled the problem of imperfect word alignment in cross-lingual dependency parsing by using code-mixed treebanks, resulting in more effective performance than translated treebanks and achieving highly competitive results on University Dependency Treebanks.
Treebank translation is a promising method for cross-lingual transfer of syntactic dependency knowledge. The basic idea is to map dependency arcs from a source treebank to its target translation according to word alignments. This method, however, can suffer from imperfect alignment between source and target words. To address this problem, we investigate syntactic transfer by code mixing, translating only confident words in a source treebank. Cross-lingual word embeddings are leveraged for transferring syntactic knowledge to the target from the resulting code-mixed treebank. Experiments on University Dependency Treebanks show that code-mixed treebanks are more effective than translated treebanks, giving highly competitive performances among cross-lingual parsing methods.