Characterizing Linguistic Attributes for Automatic Classification of Intent Based Racist/Radicalized Posts on Tumblr Micro-Blogging Website
This addresses the need for social media intelligence to detect hate speech, though it is incremental as it applies existing methods to a specific domain.
The paper tackled the problem of automatically identifying racist or radicalized intent in Tumblr posts by developing a cascaded ensemble learning classifier that uses semantic, sentiment, and linguistic features, achieving effective results with discriminatory features like emotion tone and language cues.
Research shows that many like-minded people use popular microblogging websites for posting hateful speech against various religions and race. Automatic identification of racist and hate promoting posts is required for building social media intelligence and security informatics based solutions. However, just keyword spotting based techniques cannot be used to accurately identify the intent of a post. In this paper, we address the challenge of the presence of ambiguity in such posts by identifying the intent of author. We conduct our study on Tumblr microblogging website and develop a cascaded ensemble learning classifier for identifying the posts having racist or radicalized intent. We train our model by identifying various semantic, sentiment and linguistic features from free-form text. Our experimental results shows that the proposed approach is effective and the emotion tone, social tendencies, language cues and personality traits of a narrative are discriminatory features for identifying the racist intent behind a post.