CRJun 4Code
Protecting K-Nearest Neighbor Queries from Location Inference AttacksZhiyu Sun, Jie Fu, Xinpeng Ling et al.
The k-nearest neighbor query (kNNQ) is a core component of modern location-based services (LBS) and has been widely adopted in popular features such as ``people nearby''. However, its potential privacy risks have long been overlooked. In this work, we present the first two attacks against kNNQ, namely the geometric intersection location inference attack (GI-LIA) and the zero-order optimization location inference attack (ZO-LIA), revealing the inherent location privacy risks posed by kNNQ. To mitigate these privacy risks, we further propose DPRS, a differential privacy framework for kNNQ protection. The core idea of DPRS is to incorporate a rejection sampling mechanism within a constrained perturbation interval, thereby mitigating the distance distortion caused by excessive noise injection. In addition, we design a private interval construction algorithm to construct the perturbation interval, enabling the rejection sampling mechanism to achieve a more favorable trade-off between privacy protection and query utility in kNNQ. Extensive experiments on real-world spatial datasets demonstrate that DPRS outperforms existing methods in both privacy protection and query utility. Our code is available at https://github.com/reanatom/DPRS.
LGAug 21, 2023Code
ALI-DPFL: Differentially Private Federated Learning with Adaptive Local IterationsXinpeng Ling, Jie Fu, Kuncan Wang et al.
Federated Learning (FL) is a distributed machine learning technique that allows model training among multiple devices or organizations by sharing training parameters instead of raw data. However, adversaries can still infer individual information through inference attacks (e.g. differential attacks) on these training parameters. As a result, Differential Privacy (DP) has been widely used in FL to prevent such attacks. We consider differentially private federated learning in a resource-constrained scenario, where both privacy budget and communication rounds are constrained. By theoretically analyzing the convergence, we can find the optimal number of local DPSGD iterations for clients between any two sequential global updates. Based on this, we design an algorithm of Differentially Private Federated Learning with Adaptive Local Iterations (ALI-DPFL). We experiment our algorithm on the MNIST, FashionMNIST and Cifar10 datasets, and demonstrate significantly better performances than previous work in the resource-constraint scenario. Code is available at https://github.com/cheng-t/ALI-DPFL.
LGAug 20, 2024Code
Single-cell Curriculum Learning-based Deep Graph Embedding ClusteringHuifa Li, Jie Fu, Xinpeng Ling et al.
The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-criteria (ChebAE) that combines three optimization objectives, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods. The code of scCLG will be made publicly available at https://github.com/LFD-byte/scCLG.
CRNov 14, 2022
SA-DPSGD: Differentially Private Stochastic Gradient Descent based on Simulated AnnealingJie Fu, Zhili Chen, XinPeng Ling
Differential privacy (DP) provides a formal privacy guarantee that prevents adversaries with access to machine learning models from extracting information about individual training points. Differentially private stochastic gradient descent (DPSGD) is the most popular training method with differential privacy in image recognition. However, existing DPSGD schemes lead to significant performance degradation, which prevents the application of differential privacy. In this paper, we propose a simulated annealing-based differentially private stochastic gradient descent scheme (SA-DPSGD) which accepts a candidate update with a probability that depends both on the update quality and on the number of iterations. Through this random update screening, we make the differentially private gradient descent proceed in the right direction in each iteration, and result in a more accurate model finally. In our experiments, under the same hyperparameters, our scheme achieves test accuracies 98.35%, 87.41% and 60.92% on datasets MNIST, FashionMNIST and CIFAR10, respectively, compared to the state-of-the-art result of 98.12%, 86.33% and 59.34%. Under the freely adjusted hyperparameters, our scheme achieves even higher accuracies, 98.89%, 88.50% and 64.17%. We believe that our method has a great contribution for closing the accuracy gap between private and non-private image classification.
LGNov 6, 2023
DP-DCAN: Differentially Private Deep Contrastive Autoencoder Network for Single-cell ClusteringHuifa Li, Jie Fu, Zhili Chen et al.
Single-cell RNA sequencing (scRNA-seq) is important to transcriptomic analysis of gene expression. Recently, deep learning has facilitated the analysis of high-dimensional single-cell data. Unfortunately, deep learning models may leak sensitive information about users. As a result, Differential Privacy (DP) is increasingly used to protect privacy. However, existing DP methods usually perturb whole neural networks to achieve differential privacy, and hence result in great performance overheads. To address this challenge, in this paper, we take advantage of the uniqueness of the autoencoder that it outputs only the dimension-reduced vector in the middle of the network, and design a Differentially Private Deep Contrastive Autoencoder Network (DP-DCAN) by partial network perturbation for single-cell clustering. Since only partial network is added with noise, the performance improvement is obvious and twofold: one part of network is trained with less noise due to a bigger privacy budget, and the other part is trained without any noise. Experimental results of six datasets have verified that DP-DCAN is superior to the traditional DP scheme with whole network perturbation. Moreover, DP-DCAN demonstrates strong robustness to adversarial attacks.
LGOct 29, 2025Code
FreIE: Low-Frequency Spectral Bias in Neural Networks for Time-Series TasksJialong Sun, Xinpeng Ling, Jiaxuan Zou et al.
The inherent autocorrelation of time series data presents an ongoing challenge to multivariate time series prediction. Recently, a widely adopted approach has been the incorporation of frequency domain information to assist in long-term prediction tasks. Many researchers have independently observed the spectral bias phenomenon in neural networks, where models tend to fit low-frequency signals before high-frequency ones. However, these observations have often been attributed to the specific architectures designed by the researchers, rather than recognizing the phenomenon as a universal characteristic across models. To unify the understanding of the spectral bias phenomenon in long-term time series prediction, we conducted extensive empirical experiments to measure spectral bias in existing mainstream models. Our findings reveal that virtually all models exhibit this phenomenon. To mitigate the impact of spectral bias, we propose the FreLE (Frequency Loss Enhancement) algorithm, which enhances model generalization through both explicit and implicit frequency regularization. This is a plug-and-play model loss function unit. A large number of experiments have proven the superior performance of FreLE. Code is available at https://github.com/Chenxing-Xuan/FreLE.
LGAug 7, 2025Code
scAGC: Learning Adaptive Cell Graphs with Contrastive Guidance for Single-Cell ClusteringHuifa Li, Jie Fu, Xinlin Zhuang et al.
Accurate cell type annotation is a crucial step in analyzing single-cell RNA sequencing (scRNA-seq) data, which provides valuable insights into cellular heterogeneity. However, due to the high dimensionality and prevalence of zero elements in scRNA-seq data, traditional clustering methods face significant statistical and computational challenges. While some advanced methods use graph neural networks to model cell-cell relationships, they often depend on static graph structures that are sensitive to noise and fail to capture the long-tailed distribution inherent in single-cell populations.To address these limitations, we propose scAGC, a single-cell clustering method that learns adaptive cell graphs with contrastive guidance. Our approach optimizes feature representations and cell graphs simultaneously in an end-to-end manner. Specifically, we introduce a topology-adaptive graph autoencoder that leverages a differentiable Gumbel-Softmax sampling strategy to dynamically refine the graph structure during training. This adaptive mechanism mitigates the problem of a long-tailed degree distribution by promoting a more balanced neighborhood structure. To model the discrete, over-dispersed, and zero-inflated nature of scRNA-seq data, we integrate a Zero-Inflated Negative Binomial (ZINB) loss for robust feature reconstruction. Furthermore, a contrastive learning objective is incorporated to regularize the graph learning process and prevent abrupt changes in the graph topology, ensuring stability and enhancing convergence. Comprehensive experiments on 9 real scRNA-seq datasets demonstrate that scAGC consistently outperforms other state-of-the-art methods, yielding the best NMI and ARI scores on 9 and 7 datasets, respectively.Our code is available at Anonymous Github.
LGMay 21, 2025Code
EC-LDA : Label Distribution Inference Attack against Federated Graph Learning with Embedding CompressionTong Cheng, Jie Fu, Xinpeng Ling et al.
Graph Neural Networks (GNNs) have been widely used for graph analysis. Federated Graph Learning (FGL) is an emerging learning framework to collaboratively train graph data from various clients. Although FGL allows client data to remain localized, a malicious server can still steal client private data information through uploaded gradient. In this paper, we for the first time propose label distribution attacks (LDAs) on FGL that aim to infer the label distributions of the client-side data. Firstly, we observe that the effectiveness of LDA is closely related to the variance of node embeddings in GNNs. Next, we analyze the relation between them and propose a new attack named EC-LDA, which significantly improves the attack effectiveness by compressing node embeddings. Then, extensive experiments on node classification and link prediction tasks across six widely used graph datasets show that EC-LDA outperforms the SOTA LDAs. Specifically, EC-LDA can achieve the Cos-sim as high as 1.0 under almost all cases. Finally, we explore the robustness of EC-LDA under differential privacy protection and discuss the potential effective defense methods to EC-LDA. Our code is available at https://github.com/cheng-t/EC-LDA.
CRMay 14, 2024
Differentially Private Federated Learning: A Systematic ReviewJie Fu, Yuan Hong, Xinpeng Ling et al.
In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within federated learning, there remains an evident deficiency in systematic reviews that categorize and synthesize these studies. Our work presents a systematic overview of the differentially private federated learning. Existing taxonomies have not adequately considered objects and level of privacy protection provided by various differential privacy models in federated learning. To rectify this gap, we propose a new taxonomy of differentially private federated learning based on definition and guarantee of various differential privacy models and federated scenarios. Our classification allows for a clear delineation of the protected objects across various differential privacy models and their respective neighborhood levels within federated learning environments. Furthermore, we explore the applications of differential privacy in federated learning scenarios. Our work provide valuable insights into privacy-preserving federated learning and suggest practical directions for future research.