SDAIApr 21

Tadabur: A Large-Scale Quran Audio Dataset

arXiv:2604.1893225.5
Predicted impact top 78% in SD · last 90 daysOriginality Synthesis-oriented
AI Analysis

For researchers in Quranic speech processing, this dataset fills a gap by providing a large, diverse resource for developing and benchmarking models.

The authors created Tadabur, a Quran audio dataset with over 1400 hours from 600+ reciters, to address the lack of large-scale, diverse Quranic speech data. The dataset provides substantial variation in recitation styles and recording conditions.

Despite growing interest in Quranic data research, existing Quran datasets remain limited in both scale and diversity. To address this gap, we present Tadabur, a large-scale Quran audio dataset. Tadabur comprises more than 1400+ hours of recitation audio from over 600 distinct reciters, providing substantial variation in recitation styles, vocal characteristics, and recording conditions. This diversity makes Tadabur a comprehensive and representative resource for Quranic speech research and analysis. By significantly expanding both the total duration and variability of available Quran data, Tadabur aims to support future research and facilitate the development of standardized Quranic speech benchmarks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes