CVMar 22

Privacy-Preserving Federated Action Recognition via Differentially Private Selective Tuning and Efficient Communication

Idris Zakariyya, Pai Chet Ng, Kaushik Bhargav Sivangi, S. Mohammad Sheikholeslami, Konstantinos N. Plataniotis, Fani Deligianni

arXiv:2603.2130551.4h-index: 8Has Code

Predicted impact top 68% in CV · last 90 daysOriginality Incremental advance

AI Analysis

This addresses privacy risks and bandwidth costs for federated learning in video analysis, though it is incremental as it builds on existing federated and differential privacy methods.

The paper tackles privacy and communication challenges in federated video action recognition by proposing FedDP-STECAR, which selectively fine-tunes and perturbs layers under differential privacy, achieving up to 70.2% higher accuracy under strict privacy and reducing communication traffic by over 99%.

Federated video action recognition enables collaborative model training without sharing raw video data, yet remains vulnerable to two key challenges: \textit{model exposure} and \textit{communication overhead}. Gradients exchanged between clients and the server can leak private motion patterns, while full-model synchronization of high-dimensional video networks causes significant bandwidth and communication costs. To address these issues, we propose \textit{Federated Differential Privacy with Selective Tuning and Efficient Communication for Action Recognition}, namely \textit{FedDP-STECAR}. Our \textit{FedDP-STECAR} framework selectively fine-tunes and perturbs only a small subset of task-relevant layers under Differential Privacy (DP), reducing the surface of information leakage while preserving temporal coherence in video features. By transmitting only the tuned layers during aggregation, communication traffic is reduced by over 99\% compared to full-model updates. Experiments on the UCF-101 dataset using the MViT-B-16x4 transformer show that \textit{FedDP-STECAR} achieves up to \textbf{70.2\% higher accuracy} under strict privacy ($Îµ=0.65$) in centralized settings and \textbf{48\% faster training} with \textbf{73.1\% accuracy} in federated setups, enabling scalable and privacy-preserving video action recognition. Code available at https://github.com/izakariyya/mvit-federated-videodp

View on arXiv PDF Code

Similar