CLDec 29, 2023

TuPy-E: detecting hate speech in Brazilian Portuguese social media with a novel dataset and comprehensive analysis of models

Felipe Oliveira, Victoria Reis, Nelson Ebecken

arXiv:2312.17704v15 citationsh-index: 2Has Code

Originality Synthesis-oriented

AI Analysis

This addresses the challenge of hate speech detection in Portuguese, which is incremental as it focuses on a specific language with existing methods.

The paper tackles hate speech detection in Brazilian Portuguese social media by introducing TuPy-E, the largest annotated corpus for this task, and conducts a detailed analysis using BERT models to advance detection capabilities.

Social media has become integral to human interaction, providing a platform for communication and expression. However, the rise of hate speech on these platforms poses significant risks to individuals and communities. Detecting and addressing hate speech is particularly challenging in languages like Portuguese due to its rich vocabulary, complex grammar, and regional variations. To address this, we introduce TuPy-E, the largest annotated Portuguese corpus for hate speech detection. TuPy-E leverages an open-source approach, fostering collaboration within the research community. We conduct a detailed analysis using advanced techniques like BERT models, contributing to both academic understanding and practical applications

View on arXiv PDF

Similar