AS SDSep 16, 2021

DDS: A new device-degraded speech dataset for speech enhancement

arXiv:2109.07931v41.2

Originality Synthesis-oriented

AI Analysis

This provides a resource for speech enhancement research, but it is incremental as it focuses on dataset creation rather than a novel method.

The paper tackles the problem of enhancing speech recorded on consumer devices in uncontrolled environments by introducing a new dataset, DDS, which provides approximately 2,000 hours of aligned high- and low-quality speech across 27 realistic conditions, and shows the impact of recording diversity on baseline system performance.

A large and growing amount of speech content in real-life scenarios is being recorded on consumer-grade devices in uncontrolled environments, resulting in degraded speech quality. Transforming such low-quality device-degraded speech into high-quality speech is a goal of speech enhancement (SE). This paper introduces a new speech dataset, DDS, to facilitate the research on SE. DDS provides aligned parallel recordings of high-quality speech (recorded in professional studios) and a number of versions of low-quality speech, producing approximately 2,000 hours speech data. The DDS dataset covers 27 realistic recording conditions by combining diverse acoustic environments and microphone devices, and each version of a condition consists of multiple recordings from six microphone positions to simulate different noise and reverberation levels. We also test several SE baseline systems on the DDS dataset and show the impact of recording diversity on performance.

View on arXiv PDF

Similar