ASSDApr 7, 2019

VoiceID Loss: Speech Enhancement for Speaker Verification

arXiv:1904.03601v2101 citations
Originality Incremental advance
AI Analysis

This work addresses robustness in speaker verification systems, particularly in noisy environments, representing an incremental advancement by integrating enhancement with verification feedback.

The paper tackled the problem of speaker verification robustness by proposing VoiceID loss, a novel loss function that uses feedback from a speaker verification model to train a speech enhancement network, resulting in consistent improvements in verification performance under both clean and noisy conditions.

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification. In contrast to the commonly used loss functions for speech enhancement such as the L2 loss, the VoiceID loss is based on the feedback from a speaker verification model to generate a ratio mask. The generated ratio mask is multiplied pointwise with the original spectrogram to filter out unnecessary components for speaker verification. In the experiments, we observed that the enhancement network, after training with the VoiceID loss, is able to ignore a substantial amount of time-frequency bins, such as those dominated by noise, for verification. The resulting model consistently improves the speaker verification system on both clean and noisy conditions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes