CLMar 31, 2017

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

arXiv:1703.10772v132 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of parsing code-mixed text for multilingual speakers, but it is incremental as it builds on existing monolingual resources.

The paper tackles the problem of parsing code-mixed data by proposing strategies that leverage pre-existing monolingual annotated resources, achieving significantly better results compared to an informed baseline, and presents a manually annotated dataset of 450 Hindi-English code-mixed tweets for evaluation.

In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes