James Joshi

CR
h-index16
13papers
464citations
Novelty53%
AI Score51

13 Papers

CRMay 11
Conformal-DP: A Density-Aware Mechanism for Differential Privacy over Riemannian Manifolds via Conformal Transformation

Peilin He, Liou Tang, M. Amin Rahimian et al.

Differential Privacy (DP) is being increasingly adopted for non-Euclidean data that lie on complex, high-dimensional manifolds. Existing DP mechanisms for manifold data consider geometric properties when calibrating privacy perturbations, but they largely fail to capture variations in data density within datasets, leading to biased perturbations and suboptimal privacy-utility trade-offs due to heterogeneous data distributions. In this paper, we propose a novel density-aware differential privacy mechanism on Riemannian manifolds, referred to as Conformal-DP, that leverages conformal transformations to calibrate perturbations based on local densities and to induce a density-balanced geometry. We prove that our mechanism satisfies $ε$-differential privacy on any complete Riemannian manifold under mild regularity assumptions. In addition, we derive a closed-form expected geodesic error bound that depends only on the underlying data density ratio and is independent of global curvature. Our empirical results on synthetic and real-world datasets demonstrate that the proposed Conformal-DP mechanism substantially improves the privacy-utility trade-off in heterogeneous data distribution settings, with worst-case performance comparable to state-of-the-art manifold DP mechanisms that assume uniformly distributed data.

CRFeb 8, 2025
Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning

Runhua Xu, Shiqi Gao, Chao Li et al.

Federated learning (FL) is inherently susceptible to privacy breaches and poisoning attacks. To tackle these challenges, researchers have separately devised secure aggregation mechanisms to protect data privacy and robust aggregation methods that withstand poisoning attacks. However, simultaneously addressing both concerns is challenging; secure aggregation facilitates poisoning attacks as most anomaly detection techniques require access to unencrypted local model updates, which are obscured by secure aggregation. Few recent efforts to simultaneously tackle both challenges offen depend on impractical assumption of non-colluding two-server setups that disrupt FL's topology, or three-party computation which introduces scalability issues, complicating deployment and application. To overcome this dilemma, this paper introduce a Dual Defense Federated learning (DDFed) framework. DDFed simultaneously boosts privacy protection and mitigates poisoning attacks, without introducing new participant roles or disrupting the existing FL topology. DDFed initially leverages cutting-edge fully homomorphic encryption (FHE) to securely aggregate model updates, without the impractical requirement for non-colluding two-server setups and ensures strong privacy protection. Additionally, we proposes a unique two-phase anomaly detection mechanism for encrypted model updates, featuring secure similarity computation and feedback-driven collaborative selection, with additional measures to prevent potential privacy breaches from Byzantine clients incorporated into the detection process. We conducted extensive experiments on various model poisoning attacks and FL scenarios, including both cross-device and cross-silo FL. Experiments on publicly available datasets demonstrate that DDFed successfully protects model privacy and effectively defends against model poisoning threats.

LGJun 30, 2025
PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction

Peilin He, James Joshi

Reconstructing high-quality images from low-resolution inputs using Residual Dense Spatial Networks (RDSNs) is crucial yet challenging. It is even more challenging in centralized training where multiple collaborating parties are involved, as it poses significant privacy risks, including data leakage and inference attacks, as well as high computational and communication costs. We propose a novel Privacy-Preserving Federated Learning-based RDSN (PPFL-RDSN) framework specifically tailored for encrypted lossy image reconstruction. PPFL-RDSN integrates Federated Learning (FL), local differential privacy, and robust model watermarking techniques to ensure that data remains secure on local clients/devices, safeguards privacy-sensitive information, and maintains model authenticity without revealing underlying data. Empirical evaluations show that PPFL-RDSN achieves comparable performance to the state-of-the-art centralized methods while reducing computational burdens, and effectively mitigates security and privacy vulnerabilities, making it a practical solution for secure and privacy-preserving collaborative computer vision applications.

LGJun 11, 2025
Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning

Liou Tang, James Joshi, Ashish Kundu

Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.

CRApr 30, 2025
Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning

Zhiyong Jin, Runhua Xu, Chao Li et al.

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet it faces significant challenges in communication efficiency and vulnerability to poisoning attacks. While sparsification techniques mitigate communication overhead by transmitting only critical model parameters, they inadvertently amplify security risks: adversarial clients can exploit sparse updates to evade detection and degrade model performance. Existing defense mechanisms, designed for standard FL communication scenarios, are ineffective in addressing these vulnerabilities within sparsified FL. To bridge this gap, we propose FLARE, a novel federated learning framework that integrates sparse index mask inspection and model update sign similarity analysis to detect and mitigate poisoning attacks in sparsified FL. Extensive experiments across multiple datasets and adversarial scenarios demonstrate that FLARE significantly outperforms existing defense strategies, effectively securing sparsified FL against poisoning attacks while maintaining communication efficiency.

LGAug 10, 2021
Privacy-Preserving Machine Learning: Methods, Challenges and Directions

Runhua Xu, Nathalie Baracaldo, James Joshi

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.

LGMay 18, 2021
Adaptive ABAC Policy Learning: A Reinforcement Learning Approach

Leila Karimi, Mai Abdelhakim, James Joshi

With rapid advances in computing systems, there is an increasing demand for more effective and efficient access control (AC) approaches. Recently, Attribute Based Access Control (ABAC) approaches have been shown to be promising in fulfilling the AC needs of such emerging complex computing environments. An ABAC model grants access to a requester based on attributes of entities in a system and an authorization policy; however, its generality and flexibility come with a higher cost. Further, increasing complexities of organizational systems and the need for federated accesses to their resources make the task of AC enforcement and management much more challenging. In this paper, we propose an adaptive ABAC policy learning approach to automate the authorization management task. We model ABAC policy learning as a reinforcement learning problem. In particular, we propose a contextual bandit system, in which an authorization engine adapts an ABAC model through a feedback control loop; it relies on interacting with users/administrators of the system to receive their feedback that assists the model in making authorization decisions. We propose four methods for initializing the learning model and a planning approach based on attribute value hierarchy to accelerate the learning process. We focus on developing an adaptive ABAC policy learning model for a home IoT environment as a running example. We evaluate our proposed approach over real and synthetic data. We consider both complete and sparse datasets in our evaluations. Our experimental results show that the proposed approach achieves performance that is comparable to ones based on supervised learning in many scenarios and even outperforms them in several situations.

LGMar 5, 2021
FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Runhua Xu, Nathalie Baracaldo, Yi Zhou et al.

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

CRFeb 2, 2021
Blockchain-based Transparency Framework for Privacy Preserving Third-party Services

Runhua Xu, Chao Li, James Joshi

Increasingly, information systems rely on computational, storage, and network resources deployed in third-party facilities such as cloud centers and edge nodes. Such an approach further exacerbates cybersecurity concerns constantly raised by numerous incidents of security and privacy attacks resulting in data leakage and identity theft, among others. These have, in turn, forced the creation of stricter security and privacy-related regulations and have eroded the trust in cyberspace. In particular, security-related services and infrastructures, such as Certificate Authorities (CAs) that provide digital certificate services and Third-Party Authorities (TPAs) that provide cryptographic key services, are critical components for establishing trust in crypto-based privacy-preserving applications and services. To address such trust issues, various transparency frameworks and approaches have been recently proposed in the literature. This paper proposes TAB framework that provides transparency and trustworthiness of third-party authority and third-party facilities using blockchain techniques for emerging crypto-based privacy-preserving applications. TAB employs the Ethereum blockchain as the underlying public ledger and also includes a novel smart contract to automate accountability with an incentive mechanism that motivates users to participate in auditing, and punishes unintentional or malicious behaviors. We implement TAB and show through experimental evaluation in the Ethereum official test network, Rinkeby, that the framework is efficient. We also formally show the security guarantee provided by TAB, and analyze the privacy guarantee and trustworthiness it provides.

LGDec 18, 2020
NN-EMD: Efficiently Training Neural Networks using Encrypted Multi-Sourced Datasets

Runhua Xu, James Joshi, Chao Li

Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task, however, it is extremely challenging to efficiently train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as homomorphic encryption, is inefficient. Further, for an enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over multiple encrypted datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. Compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting.

CRNov 12, 2020
Revisiting Secure Computation Using Functional Encryption: Opportunities and Research Directions

Runhua Xu, James Joshi

Increasing incidents of security compromises and privacy leakage have raised serious privacy concerns related to cyberspace. Such privacy concerns have been instrumental in the creation of several regulations and acts to restrict the availability and use of privacy-sensitive data. The secure computation problem, initially and formally introduced as secure two-party computation by Andrew Yao in 1986, has been the focus of intense research in academia because of its fundamental role in building many of the existing privacy-preserving approaches. Most of the existing secure computation solutions rely on garbled-circuits and homomorphic encryption techniques to tackle secure computation issues, including efficiency and security guarantees. However, it is still challenging to adopt these secure computation approaches in emerging compute-intensive and data-intensive applications such as emerging machine learning solutions. Recently proposed functional encryption scheme has shown its promise as an underlying secure computation foundation in recent privacy-preserving machine learning approaches proposed. This paper revisits the secure computation problem using emerging and promising functional encryption techniques and presents a comprehensive study. We first briefly summarize existing conventional secure computation approaches built on garbled-circuits, oblivious transfer, and homomorphic encryption techniques. Then, we elaborate on the unique characteristics and challenges of emerging functional encryption based secure computation approaches and outline several research directions.

CRMar 16, 2020
An Automatic Attribute Based Access Control Policy Extraction from Access Logs

Leila Karimi, Maryam Aldairi, James Joshi et al.

With the rapid advances in computing and information technologies, traditional access control models have become inadequate in terms of capturing fine-grained, and expressive security requirements of newly emerging applications. An attribute-based access control (ABAC) model provides a more flexible approach for addressing the authorization needs of complex and dynamic systems. While organizations are interested in employing newer authorization models, migrating to such models pose as a significant challenge. Many large-scale businesses need to grant authorization to their user populations that are potentially distributed across disparate and heterogeneous computing environments. Each of these computing environments may have its own access control model. The manual development of a single policy framework for an entire organization is tedious, costly, and error-prone. In this paper, we present a methodology for automatically learning ABAC policy rules from access logs of a system to simplify the policy development process. The proposed approach employs an unsupervised learning-based algorithm for detecting patterns in access logs and extracting ABAC authorization rules from these patterns. In addition, we present two policy improvement algorithms, including rule pruning and policy refinement algorithms to generate a higher quality mined policy. Finally, we implement a prototype of the proposed approach to demonstrate its feasibility.

SISep 24, 2013
A Friendship Privacy Attack on Friends and 2-Distant Neighbors in Social Networks

Lei Jin, Xuelian Long, James Joshi

In an undirected social graph, a friendship link involves two users and the friendship is visible in both the users' friend lists. Such a dual visibility of the friendship may raise privacy threats. This is because both users can separately control the visibility of a friendship link to other users and their privacy policies for the link may not be consistent. Even if one of them conceals the link from a third user, the third user may find such a friendship link from another user's friend list. In addition, as most users allow their friends to see their friend lists in most social network systems, an adversary can exploit the inconsistent policies to launch privacy attacks to identify and infer many of a targeted user's friends. In this paper, we propose, analyze and evaluate such an attack which is called Friendship Identification and Inference (FII) attack. In a FII attack scenario, we assume that an adversary can only see his friend list and the friend lists of his friends who do not hide the friend lists from him. Then, a FII attack contains two attack steps: 1) friend identification and 2) friend inference. In the friend identification step, the adversary tries to identify a target's friends based on his friend list and those of his friends. In the friend inference step, the adversary attempts to infer the target's friends by using the proposed random walk with restart approach. We present experimental results using three real social network datasets and show that FII attacks are generally efficient and effective when adversaries and targets are friends or 2-distant neighbors. We also comprehensively analyze the attack results in order to find what values of parameters and network features could promote FII attacks. Currently, most popular social network systems with an undirected friendship graph, such as Facebook, LinkedIn and Foursquare, are susceptible to FII attacks.