mdok-style at SemEval-2026 Task 9: Finetuning LLMs for Multilingual Polarization Detection
This work addresses the need for automated detection of online polarization to prevent escalation into hate speech, targeting social media platforms and online communities.
The authors finetuned mid-size LLMs with QLoRA for multilingual polarization detection across 22 languages, achieving robust performance by augmenting training data with anonymized, lower-cased, upper-cased, and homoglyphied variants.
SemEval-2026 Task 9 is focused on multilingual polarization detection. Specifically, it covers the identification of multilingual, multicultural and multievent polarization along three axes (in subtasks), namely detection, type, and manifestation. Online polarization presents a concern, because it is often followed by hate speech, offensive discourse, and social fragmentation. Therefore, its detection before it escalates is crucial for a safer and more inclusive online space. We have coped with this SemEval task by finetuning mid-size LLMs for the sequence-classification task using the QLoRA parameter-efficient finetuning technique. The training data augmented the multilingual (22 languages) training sets by anonymized, lower-cased, upper-cased, and homoglyphied counterparts, making the detection more robust.