CLSep 18, 2025

HARNESS: Lightweight Distilled Arabic Speech Foundation Models

arXiv:2509.14689v1h-index: 12
Originality Incremental advance
AI Analysis

This work addresses the problem of deploying speech models in low-resource settings for Arabic language applications, representing an incremental improvement through distillation and compression.

The paper tackles the impractical deployment of large pre-trained speech models in resource-limited environments by introducing HArnESS, a family of Arabic-centric self-supervised speech models, which achieves state-of-the-art or comparable performance on Arabic ASR, SER, and DID tasks with minimal fine-tuning.

Large pre-trained speech models excel in downstream tasks but their deployment is impractical for resource-limited environments. In this paper, we introduce HArnESS, the first Arabic-centric self-supervised speech model family, designed to capture Arabic speech nuances. Using iterative self-distillation, we train large bilingual HArnESS (HL) SSL models and then distill knowledge into compressed student models (HS, HST), preserving Arabic-specific representations. We use low-rank approximation to further compact the teacher's discrete supervision into shallow, thin models. We evaluate HArnESS on Arabic ASR, Speaker Emotion Recognition (SER), and Dialect Identification (DID), demonstrating effectiveness against HuBERT and XLS-R. With minimal fine-tuning, HArnESS achieves SOTA or comparable performance, making it a lightweight yet powerful alternative for real-world use. We release our distilled models and findings to support responsible research and deployment in low-resource settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes