CLOct 17, 2019

Cross-lingual Parsing with Polyglot Training and Multi-treebank Learning: A Faroese Case Study

James Barry, Joachim Wagner, Jennifer Foster

arXiv:1910.07938v130.1999 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This work addresses the problem of inducing parsers for low-resource languages like Faroese, but it is incremental as it builds on existing projection and multi-treebank methods.

The study tackled cross-lingual dependency parsing for low-resource languages by comparing polyglot training and multi-treebank learning on Faroese, finding that polyglot training generally improved results, but the best performance came from projecting from monolingual models and using multi-treebank models on the target side.

Cross-lingual dependency parsing involves transferring syntactic knowledge from one language to another. It is a crucial component for inducing dependency parsers in low-resource scenarios where no training data for a language exists. Using Faroese as the target language, we compare two approaches using annotation projection: first, projecting from multiple monolingual source models; second, projecting from a single polyglot model which is trained on the combination of all source languages. Furthermore, we reproduce multi-source projection (Tyers et al., 2018), in which dependency trees of multiple sources are combined. Finally, we apply multi-treebank modelling to the projected treebanks, in addition to or alternatively to polyglot modelling on the source side. We find that polyglot training on the source languages produces an overall trend of better results on the target language but the single best result for the target language is obtained by projecting from monolingual source parsing models and then training multi-treebank POS tagging and parsing models on the target side.

View on arXiv PDF Code

Similar