CLSep 27, 2025

AraS2P: Arabic Speech-to-Phonemes System

arXiv:2509.23504v13 citationsh-index: 4Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Originality Incremental advance
AI Analysis

This work addresses phoneme-level mispronunciation detection in Arabic, which is incremental as it builds on existing models with targeted adaptations.

The paper tackled the problem of Arabic speech-to-phonemes conversion for mispronunciation detection by adapting Wav2Vec2-BERT with a two-stage training strategy, achieving first place on the Iqra'Eval 2025 leaderboard.

This paper describes AraS2P, our speech-to-phonemes system submitted to the Iqra'Eval 2025 Shared Task. We adapted Wav2Vec2-BERT via Two-Stage training strategy. In the first stage, task-adaptive continue pretraining was performed on large-scale Arabic speech-phonemes datasets, which were generated by converting the Arabic text using the MSA Phonetiser. In the second stage, the model was fine-tuned on the official shared task data, with additional augmentation from XTTS-v2-synthesized recitations featuring varied Ayat segments, speaker embeddings, and textual perturbations to simulate possible human errors. The system ranked first on the official leaderboard, demonstrating that phoneme-aware pretraining combined with targeted augmentation yields strong performance in phoneme-level mispronunciation detection.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes