CLMay 30, 2018

An English-Hindi Code-Mixed Corpus: Stance Annotation and Baseline System

Sahil Swami, Ankush Khandelwal, Vinay Singh, Syed Sarfaraz Akhtar, Manish Shrivastava

arXiv:1805.11868v10.822 citations

Originality Synthesis-oriented

AI Analysis

This provides a resource for analyzing social media opinions in code-mixed languages, though it is incremental as it focuses on a specific dataset and task.

The authors tackled stance detection in English-Hindi code-mixed tweets on demonetization in India, creating a dataset of 3545 tweets and achieving a baseline accuracy of 58.7% with supervised classification.

Social media has become one of the main channels for peo- ple to communicate and share their views with the society. We can often detect from these views whether the person is in favor, against or neu- tral towards a given topic. These opinions from social media are very useful for various companies. We present a new dataset that consists of 3545 English-Hindi code-mixed tweets with opinion towards Demoneti- sation that was implemented in India in 2016 which was followed by a large countrywide debate. We present a baseline supervised classification system for stance detection developed using the same dataset that uses various machine learning techniques to achieve an accuracy of 58.7% on 10-fold cross validation.

View on arXiv PDF

Similar