CLApr 19, 2023

MasakhaNEWS: News Topic Classification for African languages

MILA
arXiv:2304.09972v2142 citationsh-index: 34
Originality Synthesis-oriented
AI Analysis

This addresses the lack of standardized datasets for African languages in NLP, providing a resource for researchers, but it is incremental as it builds on existing methods for low-resource settings.

The authors tackled the under-representation of African languages in NLP by creating MasakhaNEWS, a benchmark dataset for news topic classification across 16 languages, and evaluated baseline models, showing that prompting ChatGPT achieved 70 F1 points in zero-shot settings and pattern exploiting training reached 86.0 F1 points with 10 examples per label in few-shot settings.

African languages are severely under-represented in NLP research due to lack of datasets covering several NLP tasks. While there are individual language specific datasets that are being expanded to different tasks, only a handful of NLP tasks (e.g. named entity recognition and machine translation) have standardized benchmark datasets covering several geographical and typologically-diverse African languages. In this paper, we develop MasakhaNEWS -- a new benchmark dataset for news topic classification covering 16 languages widely spoken in Africa. We provide an evaluation of baseline models by training classical machine learning models and fine-tuning several language models. Furthermore, we explore several alternatives to full fine-tuning of language models that are better suited for zero-shot and few-shot learning such as cross-lingual parameter-efficient fine-tuning (like MAD-X), pattern exploiting training (PET), prompting language models (like ChatGPT), and prompt-free sentence transformer fine-tuning (SetFit and Cohere Embedding API). Our evaluation in zero-shot setting shows the potential of prompting ChatGPT for news topic classification in low-resource African languages, achieving an average performance of 70 F1 points without leveraging additional supervision like MAD-X. In few-shot setting, we show that with as little as 10 examples per label, we achieved more than 90\% (i.e. 86.0 F1 points) of the performance of full supervised training (92.6 F1 points) leveraging the PET approach.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes