SDLGASFeb 5, 2024

Adversarial Data Augmentation for Robust Speaker Verification

arXiv:2402.02699v16 citationsh-index: 19ICCIP
Originality Incremental advance
AI Analysis

This work addresses robustness issues in speaker verification systems, which is important for applications like security and voice assistants, but it is incremental as it builds on existing data augmentation methods.

The paper tackles the problem of unwanted distortion from data augmentation in speaker verification by proposing adversarial data augmentation (A-DA), which combines data augmentation with adversarial learning to make speaker embeddings more robust; experiments on VoxCeleb and CN-Celeb datasets show that A-DA outperforms standard data augmentation in matched and mismatched test conditions.

Data augmentation (DA) has gained widespread popularity in deep speaker models due to its ease of implementation and significant effectiveness. It enriches training data by simulating real-life acoustic variations, enabling deep neural networks to learn speaker-related representations while disregarding irrelevant acoustic variations, thereby improving robustness and generalization. However, a potential issue with the vanilla DA is augmentation residual, i.e., unwanted distortion caused by different types of augmentation. To address this problem, this paper proposes a novel approach called adversarial data augmentation (A-DA) which combines DA with adversarial learning. Specifically, it involves an additional augmentation classifier to categorize various augmentation types used in data augmentation. This adversarial learning empowers the network to generate speaker embeddings that can deceive the augmentation classifier, making the learned speaker embeddings more robust in the face of augmentation variations. Experiments conducted on VoxCeleb and CN-Celeb datasets demonstrate that our proposed A-DA outperforms standard DA in both augmentation matched and mismatched test conditions, showcasing its superior robustness and generalization against acoustic variations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes