CLAIOct 15, 2025

Personal Attribute Leakage in Federated Speech Models

arXiv:2510.13357v1h-index: 23
Originality Incremental advance
AI Analysis

This exposes undocumented privacy risks in federated speech models, which is a problem for users and developers seeking privacy-preserving AI, though it is incremental as it builds on existing attack methods.

The paper analyzed the vulnerability of federated ASR models to attribute inference attacks, demonstrating that sensitive attributes like gender, age, accent, emotion, and dysarthria can be inferred from weight differentials without raw speech data, with accents being reliably inferred across all tested models.

Federated learning is a common method for privacy-preserving training of machine learning models. In this paper, we analyze the vulnerability of ASR models to attribute inference attacks in the federated setting. We test a non-parametric white-box attack method under a passive threat model on three ASR models: Wav2Vec2, HuBERT, and Whisper. The attack operates solely on weight differentials without access to raw speech from target speakers. We demonstrate attack feasibility on sensitive demographic and clinical attributes: gender, age, accent, emotion, and dysarthria. Our findings indicate that attributes that are underrepresented or absent in the pre-training data are more vulnerable to such inference attacks. In particular, information about accents can be reliably inferred from all models. Our findings expose previously undocumented vulnerabilities in federated ASR models and offer insights towards improved security.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes