ASSDNov 16, 2021

Unsupervised Speech Enhancement with speech recognition embedding and disentanglement losses

arXiv:2111.08678v225 citations
Originality Incremental advance
AI Analysis

This work addresses domain mismatch and ASR degradation issues in speech enhancement, but it is incremental as it builds on existing MixIT methods with modifications.

The authors tackled the problems of domain mismatch in synthetic training data and the trade-off between speech enhancement and ASR performance by proposing an unsupervised loss function based on MixIT with speech recognition embedding and disentanglement losses. Their results showed that joint supervised and unsupervised training achieved similar speech quality and better ASR performance than the best supervised baseline on the noisy VoxCeleb dataset.

Speech enhancement has recently achieved great success with various deep learning methods. However, most conventional speech enhancement systems are trained with supervised methods that impose two significant challenges. First, a majority of training datasets for speech enhancement systems are synthetic. When mixing clean speech and noisy corpora to create the synthetic datasets, domain mismatches occur between synthetic and real-world recordings of noisy speech or audio. Second, there is a trade-off between increasing speech enhancement performance and degrading speech recognition (ASR) performance. Thus, we propose an unsupervised loss function to tackle those two problems. Our function is developed by extending the MixIT loss function with speech recognition embedding and disentanglement loss. Our results show that the proposed function effectively improves the speech enhancement performance compared to a baseline trained in a supervised way on the noisy VoxCeleb dataset. While fully unsupervised training is unable to exceed the corresponding baseline, with joint super- and unsupervised training, the system is able to achieve similar speech quality and better ASR performance than the best supervised baseline.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes