CLOct 31, 2016

Experiments with POS Tagging Code-mixed Indian Social Media Text

arXiv:1610.09799v13.64 citations

Originality Synthesis-oriented

AI Analysis

This addresses POS tagging for code-mixed text in Indian languages, which is an incremental contribution to NLP for social media analysis.

The paper tackled POS tagging for code-mixed Indian social media text in Hindi, Bengali, and Telugu mixed with English, using machine learning with word2vec and log-linear models, but no concrete performance numbers are reported.

This paper presents Centre for Development of Advanced Computing Mumbai's (CDACM) submission to the NLP Tools Contest on Part-Of-Speech (POS) Tagging For Code-mixed Indian Social Media Text (POSCMISMT) 2015 (collocated with ICON 2015). We submitted results for Hindi (hi), Bengali (bn), and Telugu (te) languages mixed with English (en). In this paper, we have described our approaches to the POS tagging techniques, we exploited for this task. Machine learning has been used to POS tag the mixed language text. For POS tagging, distributed representations of words in vector space (word2vec) for feature extraction and Log-linear models have been tried. We report our work on all three languages hi, bn, and te mixed with en.

View on arXiv PDF

Similar