ASCLLGSDSep 20, 2023

Leveraging Data Collection and Unsupervised Learning for Code-switched Tunisian Arabic Automatic Speech Recognition

arXiv:2309.11327v210 citationsh-index: 2
Originality Synthesis-oriented
AI Analysis

This addresses the problem of data scarcity and linguistic diversity in ASR for the Tunisian dialect, which is incremental as it applies existing methods to a new domain.

The paper tackles Automatic Speech Recognition for code-switched Tunisian Arabic by collecting and annotating data, then applying self-supervision, semi-supervision, and few-shot learning approaches to achieve state-of-the-art results on various Tunisian test sets, with all models and data released publicly.

Crafting an effective Automatic Speech Recognition (ASR) solution for dialects demands innovative approaches that not only address the data scarcity issue but also navigate the intricacies of linguistic diversity. In this paper, we address the aforementioned ASR challenge, focusing on the Tunisian dialect. First, textual and audio data is collected and in some cases annotated. Second, we explore self-supervision, semi-supervision and few-shot code-switching approaches to push the state-of-the-art on different Tunisian test sets; covering different acoustic, linguistic and prosodic conditions. Finally, and given the absence of conventional spelling, we produce a human evaluation of our transcripts to avoid the noise coming from spelling inadequacies in our testing references. Our models, allowing to transcribe audio samples in a linguistic mix involving Tunisian Arabic, English and French, and all the data used during training and testing are released for public use and further improvements.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes