CLAILGNINov 1, 2021

ASMDD: Arabic Speech Mispronunciation Detection Dataset

arXiv:2111.01136v12 citations
Originality Synthesis-oriented
AI Analysis

This dataset addresses the need for resources in Arabic speech processing, specifically for mispronunciation detection in children, but is incremental as it primarily offers new data rather than novel methods.

The authors introduced ASMDD, the largest dataset for Arabic speech mispronunciation detection, focusing on Egyptian children's pronunciations of the top 100 Arabic words, with annotations by expert listeners.

The largest dataset of Arabic speech mispronunciation detections in Egyptian dialogues is introduced. The dataset is composed of annotated audio files representing the top 100 words that are most frequently used in the Arabic language, pronounced by 100 Egyptian children (aged between 2 and 8 years old). The dataset is collected and annotated on segmental pronunciation error detections by expert listeners.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes