Nils Peters

AS
h-index40
3papers
3citations
Novelty43%
AI Score32

3 Papers

ASJun 11, 2025
You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks

Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi et al.

Speaker anonymization systems hide the identity of speakers while preserving other information such as linguistic content and emotions. To evaluate their privacy benefits, attacks in the form of automatic speaker verification (ASV) systems are employed. In this study, we assess the impact of intra-speaker linguistic content similarity in the attacker training and evaluation datasets, by adapting BERT, a language model, as an ASV system. On the VoicePrivacy Attacker Challenge datasets, our method achieves a mean equal error rate (EER) of 35%, with certain speakers attaining EERs as low as 2%, based solely on the textual content of their utterances. Our explainability study reveals that the system decisions are linked to semantically similar keywords within utterances, stemming from how LibriSpeech is curated. Our study suggests reworking the VoicePrivacy datasets to ensure a fair and unbiased evaluation and challenge the reliance on global EER for privacy evaluations.

ASDec 9, 2021
On The Effect Of Coding Artifacts On Acoustic Scene Classification

Nagashree K. S. Rao, Nils Peters

Previous DCASE challenges contributed to an increase in the performance of acoustic scene classification systems. State-of-the-art classifiers demand significant processing capabilities and memory which is challenging for resource-constrained mobile or IoT edge devices. Thus, it is more likely to deploy these models on more powerful hardware and classify audio recordings previously uploaded (or streamed) from low-power edge devices. In such scenario, the edge device may apply perceptual audio coding to reduce the transmission data rate. This paper explores the effect of perceptual audio coding on the classification performance using a DCASE 2020 challenge contribution [1]. We found that classification accuracy can degrade by up to 57% compared to classifying original (uncompressed) audio. We further demonstrate how lossy audio compression techniques during model training can improve classification accuracy of compressed audio signals even for audio codecs and codec bitrates not included in the training process.

SDOct 17, 2021
Storage and Authentication of Audio Footage for IoAuT Devices Using Distributed Ledger Technology

Srivatsav Chenna, Nils Peters

Detection of fabricated or manipulated audio content to prevent, e.g., distribution of forgeries in digital media, is crucial, especially in political and reputational contexts. Better tools for protecting the integrity of media creation are desired. Within the paradigm of the Internet of Audio Things(IoAuT), we discuss the ability of the IoAuT network to verify the authenticity of original audio using distributed ledger technology. By storing audio recordings in combination with associated recording-specific metadata obtained by the IoAuT capturing device, this architecture enables secure distribution of original audio footage, authentication of unknown audio content, and referencing of original audio material in future derivative works. By developing a proof-of-concept system, the feasibility of the proposed architecture is evaluated and discussed.