Breaching FedMD: Image Recovery via Paired-Logits Inversion Attack
This reveals a substantial privacy vulnerability in FedMD for collaborative learning systems, highlighting an incremental but critical security flaw.
The paper tackles the privacy risk in Federated Learning with Model Distillation (FedMD) by showing that sharing output logits can still expose private data, and it demonstrates that a Paired-Logits Inversion attack can reconstruct private images with high success rates on facial recognition datasets.
Federated Learning with Model Distillation (FedMD) is a nascent collaborative learning paradigm, where only output logits of public datasets are transmitted as distilled knowledge, instead of passing on private model parameters that are susceptible to gradient inversion attacks, a known privacy risk in federated learning. In this paper, we found that even though sharing output logits of public datasets is safer than directly sharing gradients, there still exists a substantial risk of data exposure caused by carefully designed malicious attacks. Our study shows that a malicious server can inject a PLI (Paired-Logits Inversion) attack against FedMD and its variants by training an inversion neural network that exploits the confidence gap between the server and client models. Experiments on multiple facial recognition datasets validate that under FedMD-like schemes, by using paired server-client logits of public datasets only, the malicious server is able to reconstruct private images on all tested benchmarks with a high success rate.