CLJan 28, 2024

cantnlp@LT-EDI-2024: Automatic Detection of Anti-LGBTQ+ Hate Speech in Under-resourced Languages

arXiv:2401.15777v1103 citationsh-index: 3LTEDI
Originality Synthesis-oriented
AI Analysis

This addresses hate speech detection for LGBTQ+ communities in under-resourced languages, but it is incremental as it builds on existing transformer methods with domain adaptation.

The paper tackled detecting anti-LGBTQ+ hate speech in ten under-resourced languages using a transformer-based model, achieving second place in Gujarati and Telugu with varying performance across languages.

This paper describes our homophobia/transphobia in social media comments detection system developed as part of the shared task at LT-EDI-2024. We took a transformer-based approach to develop our multiclass classification model for ten language conditions (English, Spanish, Gujarati, Hindi, Kannada, Malayalam, Marathi, Tamil, Tulu, and Telugu). We introduced synthetic and organic instances of script-switched language data during domain adaptation to mirror the linguistic realities of social media language as seen in the labelled training data. Our system ranked second for Gujarati and Telugu with varying levels of performance for other language conditions. The results suggest incorporating elements of paralinguistic behaviour such as script-switching may improve the performance of language detection systems especially in the cases of under-resourced languages conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes