CLMar 23

Ara-Best-RQ: Multi Dialectal Arabic SSL

arXiv:2603.2190086.3h-index: 19
AI Analysis

This addresses the problem of limited resources for Arabic dialect speech technologies, offering an incremental improvement through targeted pre-training.

The paper tackles multi-dialectal Arabic speech processing by developing Ara-BEST-RQ, a family of self-supervised learning models, achieving state-of-the-art performance in dialect identification with fewer parameters than competitors.

We present Ara-BEST-RQ, a family of self-supervised learning (SSL) models specifically designed for multi-dialectal Arabic speech processing. Leveraging 5,640 hours of crawled Creative Commons speech and combining it with publicly available datasets, we pre-train conformer-based BEST-RQ models up to 600M parameters. Our models are evaluated on dialect identification (DID) and automatic speech recognition (ASR) tasks, achieving state-of-the-art performance on the former while using fewer parameters than competing models. We demonstrate that family-targeted pre-training on Arabic dialects significantly improves downstream performance compared to multilingual or monolingual models trained on non-Arabic data. All models, code, and pre-processed datasets will be publicly released to support reproducibility and further research in Arabic speech technologies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes