CLMay 30, 2018

An English-Hindi Code-Mixed Corpus: Stance Annotation and Baseline System

arXiv:1805.11868v122 citations
Originality Synthesis-oriented
AI Analysis

This provides a resource for analyzing social media opinions in code-mixed languages, though it is incremental as it focuses on a specific dataset and task.

The authors tackled stance detection in English-Hindi code-mixed tweets on demonetization in India, creating a dataset of 3545 tweets and achieving a baseline accuracy of 58.7% with supervised classification.

Social media has become one of the main channels for peo- ple to communicate and share their views with the society. We can often detect from these views whether the person is in favor, against or neu- tral towards a given topic. These opinions from social media are very useful for various companies. We present a new dataset that consists of 3545 English-Hindi code-mixed tweets with opinion towards Demoneti- sation that was implemented in India in 2016 which was followed by a large countrywide debate. We present a baseline supervised classification system for stance detection developed using the same dataset that uses various machine learning techniques to achieve an accuracy of 58.7% on 10-fold cross validation.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes