SD CR LG ASJul 11, 2022

Speaker Anonymization with Phonetic Intermediate Representations

Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu

arXiv:2207.04834v113.835 citationsh-index: 38Has Code

Originality Incremental advance

AI Analysis

This addresses privacy concerns in speech processing by protecting speaker identity, though it appears incremental as it builds on existing challenge frameworks.

The authors tackled speaker anonymization by using phonetic transcriptions and anonymized speaker embeddings to generate speech, achieving significant improvements over Voice Privacy Challenge 2020 baselines in privacy robustness while maintaining high intelligibility and naturalness.

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings. Using phones as the intermediate representation ensures near complete elimination of speaker identity information from the input while preserving the original phonetic content as much as possible. Our experimental results on LibriSpeech and VCTK corpora reveal two key findings: 1) although automatic speech recognition produces imperfect transcriptions, our neural speech synthesis system can handle such errors, making our system feasible and robust, and 2) combining speaker embeddings from different resources is beneficial and their appropriate normalization is crucial. Overall, our final best system outperforms significantly the baselines provided in the Voice Privacy Challenge 2020 in terms of privacy robustness against a lazy-informed attacker while maintaining high intelligibility and naturalness of the anonymized speech.

View on arXiv PDF Code

Similar