CLOct 17, 2020

CUSATNLP@HASOC-Dravidian-CodeMix-FIRE2020:Identifying Offensive Language from ManglishTweets

arXiv:2010.08756v15 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the need for detecting offensive content in code-mixed social media posts, particularly for Dravidian language speakers, but it is incremental as it applies an existing method to a new dataset.

The paper tackled the problem of identifying offensive language in Manglish tweets, which are code-mixed with English and Dravidian languages, by developing an embedding model-based classifier for a message-level classification task, achieving results on a specific dataset from the HASOC 2020 competition.

With the popularity of social media, communications through blogs, Facebook, Twitter, and other plat-forms have increased. Initially, English was the only medium of communication. Fortunately, now we can communicate in any language. It has led to people using English and their own native or mother tongue language in a mixed form. Sometimes, comments in other languages have English transliterated format or other cases; people use the intended language scripts. Identifying sentiments and offensive content from such code mixed tweets is a necessary task in these times. We present a working model submitted for Task2 of the sub-track HASOC Offensive Language Identification- DravidianCodeMix in Forum for Information Retrieval Evaluation, 2020. It is a message level classification task. An embedding model-based classifier identifies offensive and not offensive comments in our approach. We applied this method in the Manglish dataset provided along with the sub-track.

View on arXiv PDF

Similar