ASSDJun 18, 2020

Self-supervised Learning for Speech Enhancement

arXiv:2006.10388v135 citations
Originality Incremental advance
AI Analysis

This addresses the need for speech enhancement in noisy environments by reducing reliance on labeled training data, though it is incremental as it builds on existing autoencoder techniques.

The paper tackles the problem of training speech enhancement networks without labeled data by proposing a self-supervised method that learns a shared latent representation between clean and noisy speech, enabling autonomous mapping from noisy to clean speech.

Supervised learning for single-channel speech enhancement requires carefully labeled training examples where the noisy mixture is input into the network and the network is trained to produce an output close to the ideal target. To relax the conditions on the training data, we consider the task of training speech enhancement networks in a self-supervised manner. We first use a limited training set of clean speech sounds and learn a latent representation by autoencoding on their magnitude spectrograms. We then autoencode on speech mixtures recorded in noisy environments and train the resulting autoencoder to share a latent representation with the clean examples. We show that using this training schema, we can now map noisy speech to its clean version using a network that is autonomously trainable without requiring labeled training examples or human intervention.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes