CL CYApr 23

MKJ at SemEval-2026 Task 9: A Comparative Study of Generalist, Specialist, and Ensemble Strategies for Multilingual Polarization

arXiv:2604.2137084.41 citationsHas Code

Predicted impact top 54% in CL · last 90 daysOriginality Synthesis-oriented

AI Analysis

This work addresses the challenge of detecting political polarization in multilingual social media texts, but the approach is incremental as it combines existing models with a selection strategy.

The paper presents a language-adaptive framework for multilingual polarization detection across 22 languages, achieving a macro-averaged F1 of 0.796 and average accuracy of 0.826 by switching between generalist, specialist, and ensemble models based on development performance.

We present a systematic study of multilingual polarization detection across 22 languages for SemEval-2026 Task 9 (Subtask 1), contrasting multilingual generalists with language-specific specialists and hybrid ensembles. While a standard generalist like XLM-RoBERTa suffices when its tokenizer aligns with the target text, it may struggle with distinct scripts (e.g., Khmer, Odia) where monolingual specialists yield significant gains. Rather than enforcing a single universal architecture, we adopt a language-adaptive framework that switches between multilingual generalists, language-specific specialists, and hybrid ensembles based on development performance. Additionally, cross-lingual augmentation via NLLB-200 yielded mixed results, often underperforming native architecture selection and degrading morphologically rich tracks. Our final system achieves an overall macro-averaged F1 score of 0.796 and an average accuracy of 0.826 across all 22 tracks. Code and final test predictions are publicly available at: https://github.com/Maziarkiani/SemEval2026-Task9-Subtask1-Polarization.

View on arXiv PDF Code

Similar