ASAISPMar 6

StreamVoiceAnon+: Emotion-Preserving Streaming Speaker Anonymization via Frame-Level Acoustic Distillation

arXiv:2603.06079v1h-index: 22
Predicted impact top 56% in AS · last 90 daysOriginality Incremental advance
AI Analysis

This work provides a significant improvement in emotion preservation for streaming speaker anonymization, which is crucial for applications requiring natural and emotionally expressive anonymized speech, benefiting users who need to protect their identity without losing paralinguistic information.

This paper tackles the problem of emotion degradation in streaming speaker anonymization, where neural audio codec language models tend to discard emotional information. The authors propose supervised finetuning with neutral-emotion utterance pairs and frame-level emotion distillation, achieving a 49.2% UAR for emotion preservation, a 24% relative improvement over the baseline, while maintaining strong privacy (EER 49.0%) and competitive intelligibility (5.77% WER).

We address the challenge of preserving emotional content in streaming speaker anonymization (SA). Neural audio codec language models trained for audio continuation tend to degrade source emotion: content tokens discard emotional information, and the model defaults to dominant acoustic patterns rather than preserving paralinguistic attributes. We propose supervised finetuning with neutral-emotion utterance pairs from the same speaker, combined with frame-level emotion distillation on acoustic token hidden states. All modifications are confined to finetuning, which takes less than 2 hours on 4 GPUs and adds zero inference latency overhead, while maintaining a competitive 180ms streaming latency. On the VoicePrivacy 2024 protocol, our approach achieves a 49.2% UAR (emotion preservation) with 5.77% WER (intelligibility), a +24% relative UAR improvement over the baseline (39.7%->49.2%) and +10% over the emotion-prompt variant (44.6% UAR), while maintaining strong privacy (EER 49.0%). Demo and code are available: https://anonymous3842031239.github.io/

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes