CLApr 28, 2023

HausaNLP at SemEval-2023 Task 10: Transfer Learning, Synthetic Data and Side-Information for Multi-Level Sexism Classification

arXiv:2305.00076v1222 citationsh-index: 20
Originality Synthesis-oriented
AI Analysis

This work addresses the problem of detecting and classifying online sexism for content moderation, but it is incremental, building on existing models and methods.

The paper tackled multi-level sexism classification in online English text by investigating transfer learning with XLM-T and HateBERT, synthetic data, and side-information, achieving an F1-score of 0.82 in a shared task, which was competitive, underperforming the best system by only 0.052%.

We present the findings of our participation in the SemEval-2023 Task 10: Explainable Detection of Online Sexism (EDOS) task, a shared task on offensive language (sexism) detection on English Gab and Reddit dataset. We investigated the effects of transferring two language models: XLM-T (sentiment classification) and HateBERT (same domain -- Reddit) for multi-level classification into Sexist or not Sexist, and other subsequent sub-classifications of the sexist data. We also use synthetic classification of unlabelled dataset and intermediary class information to maximize the performance of our models. We submitted a system in Task A, and it ranked 49th with F1-score of 0.82. This result showed to be competitive as it only under-performed the best system by 0.052% F1-score.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes