AS LGOct 27, 2025

Treble10: A high-quality dataset for far-field speech recognition, dereverberation, and enhancement

Sarabeth S. Mullins, Georg Götz, Eric Bezzam, Steven Zheng, Daniel Gert Nielsen

arXiv:2510.23141v1h-index: 1

Originality Incremental advance

AI Analysis

This provides a high-quality benchmark and data augmentation resource for researchers working on far-field speech recognition, dereverberation, and enhancement tasks.

The authors tackled the problem of limited far-field speech datasets by introducing Treble10, a large-scale dataset with over 3000 room impulse responses simulated in 10 real-world rooms using a hybrid wave-based and geometrical acoustics approach, which bridges the realism gap between measurement and simulation.

Accurate far-field speech datasets are critical for tasks such as automatic speech recognition (ASR), dereverberation, speech enhancement, and source separation. However, current datasets are limited by the trade-off between acoustic realism and scalability. Measured corpora provide faithful physics but are expensive, low-coverage, and rarely include paired clean and reverberant data. In contrast, most simulation-based datasets rely on simplified geometrical acoustics, thus failing to reproduce key physical phenomena like diffraction, scattering, and interference that govern sound propagation in complex environments. We introduce Treble10, a large-scale, physically accurate room-acoustic dataset. Treble10 contains over 3000 broadband room impulse responses (RIRs) simulated in 10 fully furnished real-world rooms, using a hybrid simulation paradigm implemented in the Treble SDK that combines a wave-based and geometrical acoustics solver. The dataset provides six complementary subsets, spanning mono, 8th-order Ambisonics, and 6-channel device RIRs, as well as pre-convolved reverberant speech scenes paired with LibriSpeech utterances. All signals are simulated at 32 kHz, accurately modelling low-frequency wave effects and high-frequency reflections. Treble10 bridges the realism gap between measurement and simulation, enabling reproducible, physically grounded evaluation and large-scale data augmentation for far-field speech tasks. The dataset is openly available via the Hugging Face Hub, and is intended as both a benchmark and a template for next-generation simulation-driven audio research.

View on arXiv PDF

Similar