Joining Hands: Exploiting Monolingual Treebanks for Parsing of Code-mixing Data
This addresses the challenge of parsing code-mixed text for multilingual speakers, but it is incremental as it builds on existing monolingual resources.
The paper tackles the problem of parsing code-mixed data by proposing strategies that leverage pre-existing monolingual annotated resources, achieving significantly better results compared to an informed baseline, and presents a manually annotated dataset of 450 Hindi-English code-mixed tweets for evaluation.
In this paper, we propose efficient and less resource-intensive strategies for parsing of code-mixed data. These strategies are not constrained by in-domain annotations, rather they leverage pre-existing monolingual annotated resources for training. We show that these methods can produce significantly better results as compared to an informed baseline. Besides, we also present a data set of 450 Hindi and English code-mixed tweets of Hindi multilingual speakers for evaluation. The data set is manually annotated with Universal Dependencies.