Runhua Xu

h-index16

17papers

865citations

Novelty48%

AI Score51

Ranked #39,775 of 205,806 authors (top 19%)#728 in CR (top 10%)

17 Papers

88.6CLMay 28Code

CRITIC-R1: Learning Structured Critics for Retrieval-Augmented Generation

Wenhan Xiao, Ziwei Zhang, Chuanyue Yu et al.

Retrieval-augmented generation (RAG) improves knowledge-intensive question answering by incorporating external evidence. However, existing RAG methods still suffer from hallucinations and subtle reasoning errors. Recent studies introduce external critics to refine RAG outputs, yet they often provide coarse-grained and weakly structured feedback, exhibit over-aggressive intervention, and lead to noisy and unreliable refinement, limiting their effectiveness for correction. To tackle these issues, we propose CRITIC-R1, a structured critic framework that formulates and learns RAG critique as an explicit error diagnosis problem using reinforcement learning (RL). Our framework categorizes common RAG errors into multiple diagnostic dimensions, including verdict, error location, reasoning analysis, and fix generation. To learn these capabilities, we design two reward functions: Conservative Judgement Alignment (CJA) first encourages calibrated high-level judgements while mitigating the over-aggressive phenomenon, whereas Diagnostic Quality Alignment (DQA) further improves fine-grained diagnostic feedback through gated rewards. We train the critic model using GRPO-based RL with process-level supervision collected from external LLM teacher models. Experiments across five QA benchmarks show that CRITIC-R1 consistently improves answer quality over strong RAG baselines. Our source code is available at https://anonymous.4open.science/r/critic-r1-FCB0

68.4CRMay 9

Toward Web 4.0: Bidirectional Trust between AI Agents and Blockchain

Yunfeng Xia, Chao Li, Lei Li et al.

Autonomous AI agents are increasingly deployed on blockchain platforms, yet the design space that governs their interaction remains poorly understood. This convergence, where autonomous agents operate on and within decentralized systems, is a defining feature of the emerging Web~4.0 paradigm. This paper presents a Systematization of Knowledge organized around a bidirectional trust framework. In the B $\boldsymbol{\rightarrow}$ A direction, we examine how blockchain provides trust infrastructure for agents, spanning identity and account abstraction, permission and delegation, intent-centric execution, and tokenized agent economies. In the A $\boldsymbol{\rightarrow}$ B direction, we examine the reverse: how AI agents participate in core blockchain mechanisms including security auditing, consensus, and governance. A Trust Foundation of verifiable computation underpins both directions, with each primitive offering different trade-offs between trust minimality, computational overhead, and deployment readiness. We formalize the interaction as an Agent-Blockchain Interaction Model (ABIM), catalog 70 Ethereum EIPs/ERCs, examine 20 representative industry projects, and review 118 academic papers, applying a five-dimensional framework assessing Verifiability, Minimality of Trust, Expressiveness, Composability, and Maturity. Our analysis uncovers significant gaps: the agent-specific standards ecosystem is overwhelmingly immature, intent architectures lack formal analysis, and while isolated works have begun to explore AI participation in consensus and governance, a unified security framing that treats AI as a first-class actor at the protocol layer remains absent. We propose a three-dimensional taxonomy, identify nine concrete open problems, and highlight the sharpest research opportunities at this intersection.

LGDec 29, 2025

FRoD: Full-Rank Efficient Fine-Tuning with Rotational Degrees for Fast Convergence

Guoan Wan, Tianyu Chen, Fangzheng Feng et al.

Parameter-efficient fine-tuning (PEFT) methods have emerged as a practical solution for adapting large foundation models to downstream tasks, reducing computational and memory costs by updating only a small subset of parameters. Among them, approaches like LoRA aim to strike a balance between efficiency and expressiveness, but often suffer from slow convergence and limited adaptation capacity due to their inherent low-rank constraints. This trade-off hampers the ability of PEFT methods to capture complex patterns needed for diverse tasks. To address these challenges, we propose FRoD, a novel fine-tuning method that combines hierarchical joint decomposition with rotational degrees of freedom. By extracting a globally shared basis across layers and injecting sparse, learnable perturbations into scaling factors for flexible full-rank updates, FRoD enhances expressiveness and efficiency, leading to faster and more robust convergence. On 20 benchmarks spanning vision, reasoning, and language understanding, FRoD matches full model fine-tuning in accuracy, while using only 1.72% of trainable parameters under identical training budgets.

CRJan 9, 2025

TAPFed: Threshold Secure Aggregation for Privacy-Preserving Federated Learning

Runhua Xu, Bo Li, Chao Li et al.

Federated learning is a computing paradigm that enhances privacy by enabling multiple parties to collaboratively train a machine learning model without revealing personal data. However, current research indicates that traditional federated learning platforms are unable to ensure privacy due to privacy leaks caused by the interchange of gradients. To achieve privacy-preserving federated learning, integrating secure aggregation mechanisms is essential. Unfortunately, existing solutions are vulnerable to recently demonstrated inference attacks such as the disaggregation attack. This paper proposes TAPFed, an approach for achieving privacy-preserving federated learning in the context of multiple decentralized aggregators with malicious actors. TAPFed uses a proposed threshold functional encryption scheme and allows for a certain number of malicious aggregators while maintaining security and privacy. We provide formal security and privacy analyses of TAPFed and compare it to various baselines through experimental evaluation. Our results show that TAPFed offers equivalent performance in terms of model quality compared to state-of-the-art approaches while reducing transmission overhead by 29%-45% across different model training scenarios. Most importantly, TAPFed can defend against recently demonstrated inference attacks caused by curious aggregators, which the majority of existing approaches are susceptible to.

CRFeb 8, 2025

Dual Defense: Enhancing Privacy and Mitigating Poisoning Attacks in Federated Learning

Runhua Xu, Shiqi Gao, Chao Li et al.

Federated learning (FL) is inherently susceptible to privacy breaches and poisoning attacks. To tackle these challenges, researchers have separately devised secure aggregation mechanisms to protect data privacy and robust aggregation methods that withstand poisoning attacks. However, simultaneously addressing both concerns is challenging; secure aggregation facilitates poisoning attacks as most anomaly detection techniques require access to unencrypted local model updates, which are obscured by secure aggregation. Few recent efforts to simultaneously tackle both challenges offen depend on impractical assumption of non-colluding two-server setups that disrupt FL's topology, or three-party computation which introduces scalability issues, complicating deployment and application. To overcome this dilemma, this paper introduce a Dual Defense Federated learning (DDFed) framework. DDFed simultaneously boosts privacy protection and mitigates poisoning attacks, without introducing new participant roles or disrupting the existing FL topology. DDFed initially leverages cutting-edge fully homomorphic encryption (FHE) to securely aggregate model updates, without the impractical requirement for non-colluding two-server setups and ensures strong privacy protection. Additionally, we proposes a unique two-phase anomaly detection mechanism for encrypted model updates, featuring secure similarity computation and feedback-driven collaborative selection, with additional measures to prevent potential privacy breaches from Byzantine clients incorporated into the detection process. We conducted extensive experiments on various model poisoning attacks and FL scenarios, including both cross-device and cross-silo FL. Experiments on publicly available datasets demonstrate that DDFed successfully protects model privacy and effectively defends against model poisoning threats.

CLApr 27, 2025

Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation

Qianren Mao, Qili Zhang, Hanwen Hao et al.

Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution for enhancing the accuracy and credibility of Large Language Models (LLMs), particularly in Question & Answer tasks. This is achieved by incorporating proprietary and private data from integrated databases. However, private RAG systems face significant challenges due to the scarcity of private domain data and critical data privacy issues. These obstacles impede the deployment of private RAG systems, as developing privacy-preserving RAG systems requires a delicate balance between data security and data availability. To address these challenges, we regard federated learning (FL) as a highly promising technology for privacy-preserving RAG services. We propose a novel framework called Federated Retrieval-Augmented Generation (FedE4RAG). This framework facilitates collaborative training of client-side RAG retrieval models. The parameters of these models are aggregated and distributed on a central-server, ensuring data privacy without direct sharing of raw data. In FedE4RAG, knowledge distillation is employed for communication between the server and client models. This technique improves the generalization of local RAG retrievers during the federated learning process. Additionally, we apply homomorphic encryption within federated learning to safeguard model parameters and mitigate concerns related to data leakage. Extensive experiments conducted on the real-world dataset have validated the effectiveness of FedE4RAG. The results demonstrate that our proposed framework can markedly enhance the performance of private RAG systems while maintaining robust data privacy protection.

CRApr 30, 2025

Sparsification Under Siege: Defending Against Poisoning Attacks in Communication-Efficient Federated Learning

Zhiyong Jin, Runhua Xu, Chao Li et al.

Federated Learning (FL) enables collaborative model training across distributed clients while preserving data privacy, yet it faces significant challenges in communication efficiency and vulnerability to poisoning attacks. While sparsification techniques mitigate communication overhead by transmitting only critical model parameters, they inadvertently amplify security risks: adversarial clients can exploit sparse updates to evade detection and degrade model performance. Existing defense mechanisms, designed for standard FL communication scenarios, are ineffective in addressing these vulnerabilities within sparsified FL. To bridge this gap, we propose FLARE, a novel federated learning framework that integrates sparse index mask inspection and model update sign similarity analysis to detect and mitigate poisoning attacks in sparsified FL. Extensive experiments across multiple datasets and adversarial scenarios demonstrate that FLARE significantly outperforms existing defense strategies, effectively securing sparsified FL against poisoning attacks while maintaining communication efficiency.

LGAug 10, 2021

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

Runhua Xu, Nathalie Baracaldo, James Joshi

Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.

LGMar 5, 2021

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Runhua Xu, Nathalie Baracaldo, Yi Zhou et al.

Federated learning (FL) has been proposed to allow collaborative training of machine learning (ML) models among multiple parties where each party can keep its data private. In this paradigm, only model updates, such as model weights or gradients, are shared. Many existing approaches have focused on horizontal FL, where each party has the entire feature set and labels in the training data set. However, many real scenarios follow a vertically-partitioned FL setup, where a complete feature set is formed only when all the datasets from the parties are combined, and the labels are only available to a single party. Privacy-preserving vertical FL is challenging because complete sets of labels and features are not owned by one entity. Existing approaches for vertical FL require multiple peer-to-peer communications among parties, leading to lengthy training times, and are restricted to (approximated) linear models and just two parties. To close this gap, we propose FedV, a framework for secure gradient computation in vertical settings for several widely used ML models such as linear models, logistic regression, and support vector machines. FedV removes the need for peer-to-peer communication among parties by using functional encryption schemes; this allows FedV to achieve faster training times. It also works for larger and changing sets of parties. We empirically demonstrate the applicability for multiple types of ML models and show a reduction of 10%-70% of training time and 80% to 90% in data transfer with respect to the state-of-the-art approaches.

CRFeb 2, 2021

Blockchain-based Transparency Framework for Privacy Preserving Third-party Services

Runhua Xu, Chao Li, James Joshi

Increasingly, information systems rely on computational, storage, and network resources deployed in third-party facilities such as cloud centers and edge nodes. Such an approach further exacerbates cybersecurity concerns constantly raised by numerous incidents of security and privacy attacks resulting in data leakage and identity theft, among others. These have, in turn, forced the creation of stricter security and privacy-related regulations and have eroded the trust in cyberspace. In particular, security-related services and infrastructures, such as Certificate Authorities (CAs) that provide digital certificate services and Third-Party Authorities (TPAs) that provide cryptographic key services, are critical components for establishing trust in crypto-based privacy-preserving applications and services. To address such trust issues, various transparency frameworks and approaches have been recently proposed in the literature. This paper proposes TAB framework that provides transparency and trustworthiness of third-party authority and third-party facilities using blockchain techniques for emerging crypto-based privacy-preserving applications. TAB employs the Ethereum blockchain as the underlying public ledger and also includes a novel smart contract to automate accountability with an incentive mechanism that motivates users to participate in auditing, and punishes unintentional or malicious behaviors. We implement TAB and show through experimental evaluation in the Ethereum official test network, Rinkeby, that the framework is efficient. We also formally show the security guarantee provided by TAB, and analyze the privacy guarantee and trustworthiness it provides.

CRJan 30, 2021

SteemOps: Extracting and Analyzing Key Operations in Steemit Blockchain-based Social Media Platform

Chao Li, Balaji Palanisamy, Runhua Xu et al.

Advancements in distributed ledger technologies are driving the rise of blockchain-based social media platforms such as Steemit, where users interact with each other in similar ways as conventional social networks. These platforms are autonomously managed by users using decentralized consensus protocols in a cryptocurrency ecosystem. The deep integration of social networks and blockchains in these platforms provides potential for numerous cross-domain research studies that are of interest to both the research communities. However, it is challenging to process and analyze large volumes of raw Steemit data as it requires specialized skills in both software engineering and blockchain systems and involves substantial efforts in extracting and filtering various types of operations. To tackle this challenge, we collect over 38 million blocks generated in Steemit during a 45 month time period from 2016/03 to 2019/11 and extract ten key types of operations performed by the users. The results generate SteemOps, a new dataset that organizes more than 900 million operations from Steemit into three sub-datasets namely (i) social-network operation dataset (SOD), (ii) witness-election operation dataset (WOD) and (iii) value-transfer operation dataset (VOD). We describe the dataset schema and its usage in detail and outline possible future research studies using SteemOps. SteemOps is designed to facilitate future research aimed at providing deeper insights on emerging blockchain-based social media platforms.

LGDec 18, 2020

NN-EMD: Efficiently Training Neural Networks using Encrypted Multi-Sourced Datasets

Runhua Xu, James Joshi, Chao Li

Training a machine learning model over an encrypted dataset is an existing promising approach to address the privacy-preserving machine learning task, however, it is extremely challenging to efficiently train a deep neural network (DNN) model over encrypted data for two reasons: first, it requires large-scale computation over huge datasets; second, the existing solutions for computation over encrypted data, such as homomorphic encryption, is inefficient. Further, for an enhanced performance of a DNN model, we also need to use huge training datasets composed of data from multiple data sources that may not have pre-established trust relationships among each other. We propose a novel framework, NN-EMD, to train DNN over multiple encrypted datasets collected from multiple sources. Toward this, we propose a set of secure computation protocols using hybrid functional encryption schemes. We evaluate our framework for performance with regards to the training time and model accuracy on the MNIST datasets. Compared to other existing frameworks, our proposed NN-EMD framework can significantly reduce the training time, while providing comparable model accuracy and privacy guarantees as well as supporting multiple data sources. Furthermore, the depth and complexity of neural networks do not affect the training time despite introducing a privacy-preserving NN-EMD setting.

CRNov 12, 2020

Revisiting Secure Computation Using Functional Encryption: Opportunities and Research Directions

Runhua Xu, James Joshi

Increasing incidents of security compromises and privacy leakage have raised serious privacy concerns related to cyberspace. Such privacy concerns have been instrumental in the creation of several regulations and acts to restrict the availability and use of privacy-sensitive data. The secure computation problem, initially and formally introduced as secure two-party computation by Andrew Yao in 1986, has been the focus of intense research in academia because of its fundamental role in building many of the existing privacy-preserving approaches. Most of the existing secure computation solutions rely on garbled-circuits and homomorphic encryption techniques to tackle secure computation issues, including efficiency and security guarantees. However, it is still challenging to adopt these secure computation approaches in emerging compute-intensive and data-intensive applications such as emerging machine learning solutions. Recently proposed functional encryption scheme has shown its promise as an underlying secure computation foundation in recent privacy-preserving machine learning approaches proposed. This paper revisits the secure computation problem using emerging and promising functional encryption techniques and presents a comprehensive study. We first briefly summarize existing conventional secure computation approaches built on garbled-circuits, oblivious transfer, and homomorphic encryption techniques. Then, we elaborate on the unique characteristics and challenges of emerging functional encryption based secure computation approaches and outline several research directions.

CRSep 5, 2020

NF-Crowd: Nearly-free Blockchain-based Crowdsourcing

Chao Li, Balaji Palanisamy, Runhua Xu et al.

Advancements in distributed ledger technologies are rapidly driving the rise of decentralized crowdsourcing systems on top of open smart contract platforms like Ethereum. While decentralized blockchain-based crowdsourcing provides numerous benefits compared to centralized solutions, current implementations of decentralized crowdsourcing suffer from fundamental scalability limitations by requiring all participants to pay a small transaction fee every time they interact with the blockchain. This increases the cost of using decentralized crowdsourcing solutions, resulting in a total payment that could be even higher than the price charged by centralized crowdsourcing platforms. This paper proposes a novel suite of protocols called NF-Crowd that resolves the scalability issue by reducing the lower bound of the total cost of a decentralized crowdsourcing project to O(1). NF-Crowd is a highly reliable solution for scaling decentralized crowdsourcing. We prove that as long as participants of a project powered by NF-Crowd are rational, the O(1) lower bound of cost could be reached regardless of the scale of the crowd. We also demonstrate that as long as at least one participant of a project powered by NF-Crowd is honest, the project cannot be aborted and the results are guaranteed to be correct. We design NF-Crowd protocols for a representative type of project named crowdsourcing contest with open community review (CC-OCR). We implement the protocols over the Ethereum official test network. Our results demonstrate that NF-Crowd protocols can reduce the cost of running a CC-OCR project to less than $2 regardless of the scale of the crowd, providing a significant cost benefit in adopting decentralized crowdsourcing solutions.

CRDec 12, 2019

HybridAlpha: An Efficient Approach for Privacy-Preserving Federated Learning

Runhua Xu, Nathalie Baracaldo, Yi Zhou et al.

Federated learning has emerged as a promising approach for collaborative and privacy-preserving learning. Participants in a federated learning process cooperatively train a model by exchanging model parameters instead of the actual training data, which they might want to keep private. However, parameter interaction and the resulting model still might disclose information about the training data used. To address these privacy concerns, several approaches have been proposed based on differential privacy and secure multiparty computation (SMC), among others. They often result in large communication overhead and slow training time. In this paper, we propose HybridAlpha, an approach for privacy-preserving federated learning employing an SMC protocol based on functional encryption. This protocol is simple, efficient and resilient to participants dropping out. We evaluate our approach regarding the training time and data volume exchanged using a federated learning process to train a CNN on the MNIST data set. Evaluation against existing crypto-based SMC solutions shows that HybridAlpha can reduce the training time by 68% and data transfer volume by 92% on average while providing the same model performance and privacy guarantees as the existing solutions.

CRApr 15, 2019

CryptoNN: Training Neural Networks over Encrypted Data

Runhua Xu, James B. D. Joshi, Chao Li

Emerging neural networks based machine learning techniques such as deep learning and its variants have shown tremendous potential in many application domains. However, they raise serious privacy concerns due to the risk of leakage of highly privacy-sensitive data when data collected from users is used to train neural network models to support predictive tasks. To tackle such serious privacy concerns, several privacy-preserving approaches have been proposed in the literature that use either secure multi-party computation (SMC) or homomorphic encryption (HE) as the underlying mechanisms. However, neither of these cryptographic approaches provides an efficient solution towards constructing a privacy-preserving machine learning model, as well as supporting both the training and inference phases. To tackle the above issue, we propose a CryptoNN framework that supports training a neural network model over encrypted data by using the emerging functional encryption scheme instead of SMC or HE. We also construct a functional encryption scheme for basic arithmetic computation to support the requirement of the proposed CryptoNN framework. We present performance evaluation and security analysis of the underlying crypto scheme and show through our experiments that CryptoNN achieves accuracy that is similar to those of the baseline neural network models on the MNIST dataset.

CRFeb 18, 2019

Scalable and Privacy-preserving Design of On/Off-chain Smart Contracts

Chao Li, Balaji Palanisamy, Runhua Xu

The rise of smart contract systems such as Ethereum has resulted in a proliferation of blockchain-based decentralized applications including applications that store and manage a wide range of data. Current smart contracts are designed to be executed solely by miners and are revealed entirely on-chain, resulting in reduced scalability and privacy. In this paper, we discuss that scalability and privacy of smart contracts can be enhanced by splitting a given contract into an off-chain contract and an on-chain contract. Specifically, functions of the contract that involve high-cost computation or sensitive information can be split and included as the off-chain contract, that is signed and executed by only the interested participants. The proposed approach allows the participants to reach unanimous agreement off-chain when all of them are honest, allowing computing resources of miners to be saved and content of the off-chain contract to be hidden from the public. In case of a dispute caused by any dishonest participants, a signed copy of the off-chain contract can be revealed so that a verified instance can be created to make miners enforce the true execution result. Thus, honest participants have the ability to redress and penalize any fraudulent or dishonest behavior, which incentivizes all participants to honestly follow the agreed off-chain contract. We discuss techniques for splitting a contract into a pair of on/off-chain contracts and propose a mechanism to address the challenges of handling dishonest participants in the system. Our implementation and evaluation of the proposed approach using an example smart contract demonstrate the effectiveness of the proposed approach in Ethereum.