CLMar 31, 2017

Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data

Irshad Ahmad Bhat, Riyaz Ahmad Bhat, Manish Shrivastava, Dipti Misra Sharma

arXiv:1703.10772v16.632 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of parsing code-mixed text for multilingual speakers, but it is incremental as it builds on existing monolingual resources.

The paper tackles the problem of parsing code-mixed data by proposing strategies that leverage pre-existing monolingual annotated resources, achieving significantly better results compared to an informed baseline, and presents a manually annotated dataset of 450 Hindi-English code-mixed tweets for evaluation.

In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.

View on arXiv PDF

Similar