ASSDNov 7, 2020

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

arXiv:2011.03706v191 citationsHas Code
AI Analysis

This toolkit addresses the need for unified speech enhancement/separation development with ASR integration, though it appears incremental as an extension of existing ESPnet infrastructure.

The authors developed ESPnet-SE, an end-to-end toolkit for speech enhancement and separation designed to integrate with automatic speech recognition systems, providing all-in-one recipes for processing single- and multi-channel data across benchmark datasets.

We present ESPnet-SE, which is designed for the quick development of speech enhancement and speech separation systems in a single framework, along with the optional downstream speech recognition module. ESPnet-SE is a new project which integrates rich automatic speech recognition related models, resources and systems to support and validate the proposed front-end implementation (i.e. speech enhancement and separation).It is capable of processing both single-channel and multi-channel data, with various functionalities including dereverberation, denoising and source separation. We provide all-in-one recipes including data pre-processing, feature extraction, training and evaluation pipelines for a wide range of benchmark datasets. This paper describes the design of the toolkit, several important functionalities, especially the speech recognition integration, which differentiates ESPnet-SE from other open source toolkits, and experimental results with major benchmark datasets.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes