ASCLLGSep 4, 2025

DarkStream: real-time speech anonymization with low latency

arXiv:2509.04667v16 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses privacy concerns in real-time speech communication by providing a low-latency anonymization solution, though it appears incremental as it builds on existing streaming synthesis and anonymization techniques.

The authors tackled real-time speaker anonymization in speech communication by proposing DarkStream, which achieved near-chance speaker verification performance (close to 50% EER) while maintaining linguistic intelligibility with a word error rate within 9%.

We propose DarkStream, a streaming speech synthesis model for real-time speaker anonymization. To improve content encoding under strict latency constraints, DarkStream combines a causal waveform encoder, a short lookahead buffer, and transformer-based contextual layers. To further reduce inference time, the model generates waveforms directly via a neural vocoder, thus removing intermediate mel-spectrogram conversions. Finally, DarkStream anonymizes speaker identity by injecting a GAN-generated pseudo-speaker embedding into linguistic features from the content encoder. Evaluations show our model achieves strong anonymization, yielding close to 50% speaker verification EER (near-chance performance) on the lazy-informed attack scenario, while maintaining acceptable linguistic intelligibility (WER within 9%). By balancing low-latency, robust privacy, and minimal intelligibility degradation, DarkStream provides a practical solution for privacy-preserving real-time speech communication.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes