AS LG SDJun 10, 2024

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation

Julius Richter, Yi-Chiao Wu, Steven Krenn, Simon Welker, Bunlong Lay, Shinji Watanabe, Alexander Richard, Timo Gerkmann

arXiv:2406.06185v221.9125 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This provides a new benchmark dataset for speech processing researchers, but it is incremental as it focuses on data creation and evaluation rather than novel methods.

The authors released the EARS dataset, a 100-hour anechoic speech dataset with 107 diverse speakers, and benchmarked speech enhancement and dereverberation methods, finding a generative method preferred in listening tests.

We release the EARS (Expressive Anechoic Recordings of Speech) dataset, a high-quality speech dataset comprising 107 speakers from diverse backgrounds, totaling in 100 hours of clean, anechoic speech data. The dataset covers a large range of different speaking styles, including emotional speech, different reading styles, non-verbal sounds, and conversational freeform speech. We benchmark various methods for speech enhancement and dereverberation on the dataset and evaluate their performance through a set of instrumental metrics. In addition, we conduct a listening test with 20 participants for the speech enhancement task, where a generative method is preferred. We introduce a blind test set that allows for automatic online evaluation of uploaded data. Dataset download links and automatic evaluation server can be found online.

View on arXiv PDF Code

Similar