CLSIDec 9, 2019

Analysis of the Ethiopic Twitter Dataset for Abusive Speech in Amharic

arXiv:1912.04419v118 citations
Originality Synthesis-oriented
AI Analysis

This work addresses abusive speech detection for Amharic speakers on Twitter, but it is incremental as it focuses on dataset analysis without introducing new methods.

The authors tackled the problem of recognizing abusive speech in Amharic by analyzing the first Ethiopic Twitter dataset, finding distributions and tendencies over time and comparing it to a general reference corpus.

In this paper, we present an analysis of the first Ethiopic Twitter Dataset for the Amharic language targeted for recognizing abusive speech. The dataset has been collected since 2014 that is written in Fidel script. Since several languages can be written using the Fidel script, we have used the existing Amharic, Tigrinya and Ge'ez corpora to retain only the Amharic tweets. We have analyzed the tweets for abusive speech content with the following targets: Analyze the distribution and tendency of abusive speech content over time and compare the abusive speech content between a Twitter and general reference Amharic corpus.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes