ASHCLGSDAug 6, 2020

Shouted Speech Compensation for Speaker Verification Robust to Vocal Effort Conditions

arXiv:2008.02487v11 citations
AI Analysis

This work addresses robustness in speaker verification for non-cooperative scenarios, but it is incremental as it adapts existing compensation techniques from automatic speech recognition.

The paper tackles the problem of speaker verification performance degradation due to vocal effort mismatch, such as shouted vs. normal speech, by proposing a linear compensation method using Gaussian mixture models and logistic regression for detection, achieving up to 13.8% relative improvement in equal error rate.

The performance of speaker verification systems degrades when vocal effort conditions between enrollment and test (e.g., shouted vs. normal speech) are different. This is a potential situation in non-cooperative speaker verification tasks. In this paper, we present a study on different methods for linear compensation of embeddings making use of Gaussian mixture models to cluster shouted and normal speech domains. These compensation techniques are borrowed from the area of robustness for automatic speech recognition and, in this work, we apply them to compensate the mismatch between shouted and normal conditions in speaker verification. Before compensation, shouted condition is automatically detected by means of logistic regression. The process is computationally light and it is performed in the back-end of an x-vector system. Experimental results show that applying the proposed approach in the presence of vocal effort mismatch yields up to 13.8% equal error rate relative improvement with respect to a system that applies neither shouted speech detection nor compensation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes