ASLGSDJun 10, 2024

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

arXiv:2406.06185v2125 citations
Originality Synthesis-oriented
AI Analysis

This provides a new benchmark dataset for speech processing researchers, but it is incremental as it focuses on data creation and evaluation rather than novel methods.

The authors released the EARS dataset, a 100-hour anechoic speech dataset with 107 diverse speakers, and benchmarked speech enhancement and dereverberation methods, finding a generative method preferred in listening tests.

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes