CLMay 12, 2022

Findings of the Shared Task on Offensive Span Identification from Code-Mixed Tamil-English Comments

arXiv:2205.06118v154 citationsh-index: 44
Originality Synthesis-oriented
AI Analysis

This work addresses offensive content moderation for Tamil-English code-mixed social media, but it is incremental as it builds on existing classification tasks by providing span-level annotations.

The paper tackled the problem of identifying offensive spans in Tamil-English code-mixed social media comments by releasing an annotated dataset, and the results showed that systems achieved performance metrics such as F1-scores, with top submissions reaching around 0.75.

Offensive content moderation is vital in social media platforms to support healthy online discussions. However, their prevalence in codemixed Dravidian languages is limited to classifying whole comments without identifying part of it contributing to offensiveness. Such limitation is primarily due to the lack of annotated data for offensive spans. Accordingly, in this shared task, we provide Tamil-English code-mixed social comments with offensive spans. This paper outlines the dataset so released, methods, and results of the submitted systems

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes