ASCLSDDec 22, 2024

Analysis of Speech Temporal Dynamics in the Context of Speaker Verification and Voice Anonymization

arXiv:2412.17164v113 citationsh-index: 24ICASSP
Originality Incremental advance
AI Analysis

This work highlights a privacy risk in voice anonymization systems for applications requiring speaker protection, though it is incremental in focusing on a specific aspect of speech.

The paper investigated how speech temporal dynamics, specifically phoneme durations, affect speaker verification and voice anonymization, finding that phoneme durations leak speaker identity in both original and anonymized speech, with experimental results demonstrating this vulnerability.

In this paper, we investigate the impact of speech temporal dynamics in application to automatic speaker verification and speaker voice anonymization tasks. We propose several metrics to perform automatic speaker verification based only on phoneme durations. Experimental results demonstrate that phoneme durations leak some speaker information and can reveal speaker identity from both original and anonymized speech. Thus, this work emphasizes the importance of taking into account the speaker's speech rate and, more importantly, the speaker's phonetic duration characteristics, as well as the need to modify them in order to develop anonymization systems with strong privacy protection capacity.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes