CLSep 16, 2021

Revisiting Tri-training of Dependency Parsers

arXiv:2109.08122v1661 citations
Originality Synthesis-oriented
AI Analysis

This work addresses dependency parsing for low-resource languages like Hungarian, Uyghur, and Vietnamese, but it is incremental as it revisits and combines existing techniques.

The study compared tri-training and pretrained word embeddings for dependency parsing in low-resource languages, finding that embeddings use unlabeled data more effectively but combining both methods yields success.

We compare two orthogonal semi-supervised learning techniques, namely tri-training and pretrained word embeddings, in the task of dependency parsing. We explore language-specific FastText and ELMo embeddings and multilingual BERT embeddings. We focus on a low resource scenario as semi-supervised learning can be expected to have the most impact here. Based on treebank size and available ELMo models, we select Hungarian, Uyghur (a zero-shot language for mBERT) and Vietnamese. Furthermore, we include English in a simulated low-resource setting. We find that pretrained word embeddings make more effective use of unlabelled data than tri-training but that the two approaches can be successfully combined.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes