Fed-Listing: Federated Label Distribution Inference in Graph Neural Networks
This work addresses a privacy threat for users in federated graph learning settings, where label distribution inference was previously underexplored, but it is incremental as it extends gradient-based attacks to graph-specific contexts.
The paper tackles the problem of label distribution inference in Federated Graph Neural Networks (FedGNNs), where shared gradients can leak sensitive information, and introduces Fed-Listing, a gradient-based attack that infers private label statistics from final-layer gradients without raw data access. The result shows that Fed-Listing significantly outperforms existing baselines on four benchmark datasets and three GNN architectures, even under non-i.i.d. scenarios, and remains effective against defense mechanisms unless model utility is severely degraded.
Graph Neural Networks (GNNs) have been intensively studied for their expressive representation and learning performance on graph-structured data, enabling effective modeling of complex relational dependencies among nodes and edges in various domains. However, the standalone GNNs can unleash threat surfaces and privacy implications, as some sensitive graph-structured data is collected and processed in a centralized setting. To solve this issue, Federated Graph Neural Networks (FedGNNs) are proposed to facilitate collaborative learning over decentralized local graph data, aiming to preserve user privacy. Yet, emerging research indicates that even in these settings, shared model updates, particularly gradients, can unintentionally leak sensitive information of local users. Numerous privacy inference attacks have been explored in traditional federated learning and extended to graph settings, but the problem of label distribution inference in FedGNNs remains largely underexplored. In this work, we introduce Fed-Listing (Federated Label Distribution Inference in GNNs), a novel gradient-based attack designed to infer the private label statistics of target clients in FedGNNs without access to raw data or node features. Fed-Listing only leverages the final-layer gradients exchanged during training to uncover statistical patterns that reveal class proportions in a stealthy manner. An auxiliary shadow dataset is used to generate diverse label partitioning strategies, simulating various client distributions, on which the attack model is obtained. Extensive experiments on four benchmark datasets and three GNN architectures show that Fed-Listing significantly outperforms existing baselines, including random guessing and Decaf, even under challenging non-i.i.d. scenarios. Moreover, applying defense mechanisms can barely reduce our attack performance, unless the model's utility is severely degraded.