CL AIOct 15, 2025

Personal Attribute Leakage in Federated Speech Models

Hamdan Al-Ali, Ali Reza Ghavamipour, Tommaso Caselli, Fatih Turkmen, Zeerak Talat, Hanan Aldarmaki

arXiv:2510.13357v12.7h-index: 23

Originality Incremental advance

AI Analysis

This exposes undocumented privacy risks in federated speech models, which is a problem for users and developers seeking privacy-preserving AI, though it is incremental as it builds on existing attack methods.

The paper analyzed the vulnerability of federated ASR models to attribute inference attacks, demonstrating that sensitive attributes like gender, age, accent, emotion, and dysarthria can be inferred from weight differentials without raw speech data, with accents being reliably inferred across all tested models.

Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.

View on arXiv PDF

Similar