SD MMMay 4

Private Speech Classification without Collapse: Stabilized DP Training and Offline Distillation

Yadi Wen, Tianxin Li, Enji Liang, Rong Du, Yue Fu

arXiv:2605.027189.6

Predicted impact top 90% in SD · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners deploying private speech classifiers with strong privacy guarantees, this work addresses the critical issue of training collapse and modality mismatch, enabling practical private audio-only models.

The paper identifies a collapse failure mode in DP-SGD for private speech classification under strong privacy (ε ≤ 1), where training degrades to near single-class prediction, and proposes a two-stage protocol combining stabilized DP training with offline distillation to an audio-only student, achieving improved Macro-F1 and balanced accuracy while maintaining privacy guarantees.

We study example-level private supervised speech classification under a practical release constraint: training may access privileged side information, but the released model must be audio-only. This setting is important because speech systems can often exploit richer side information during development, whereas deployment and release require a lightweight unimodal model with auditable privacy guarantees. Using DP-SGD on the private dataset $D_{\text{priv}}$, we identify a strong-privacy failure mode ($ε\le 1$) on imbalanced tasks, where training may collapse to a near single-class predictor, a phenomenon that overall accuracy can obscure. We therefore emphasize Macro-F1, balanced accuracy, and a simple collapse diagnostic. This failure is especially problematic in our release setting because a collapsed private teacher cannot provide useful supervision for the downstream audio-only student. To address this setting under strong privacy, we propose a two-stage protocol: (i) train a (possibly multimodal) DP teacher on $D_{\text{priv}}$, and (ii) distill an audio-only student on a fixed, recording-disjoint auxiliary dataset $D_{\text{aux}}$ using one-shot offline teacher probability outputs, releasing only the student. The DP guarantee applies only to $D_{\text{priv}}$; we make no DP claim for $D_{\text{aux}}$, and privacy of the released student with respect to $D_{\text{priv}}$ follows by post-processing. We frame this setting as involving four coupled bottlenecks: speech-induced optimization instability under DP-SGD, minority-class erosion under clipping and noise, teacher over-reliance on privileged modalities unavailable at deployment, and train--deploy modality mismatch. We address them with a DP-stabilizing acoustic front-end (DSAF), minibatch-adaptive bounded loss reweighting (AW-DP), privileged-modality dropout, and offline teacher-to-student distillation.

View on arXiv PDF

Similar