7 Papers

93.9CLApr 27Code
DPEPO: Diverse Parallel Exploration Policy Optimization for LLM-based Agents

Junshuo Zhang, Chengrui Huang, Feng Guo et al.

Large language model (LLM) agents that follow the sequential "reason-then-act" paradigm have achieved superior performance in many complex tasks.However, these methods suffer from limited exploration and incomplete environmental understanding, as they interact with only a single environment per step. In this paper, we first introduce a novel paradigm that enables an agent to interact with multiple environments simultaneously and share cross-trajectory experiences. Building upon this paradigm, we further propose DPEPO, a reinforcement learning (RL) algorithm that encourages the agent to perform diverse parallel exploration. There are two stages in DPEPO: initial supervised fine-tuning (SFT) imparts basic parallel reasoning and action generation, followed by reinforcement learning stage with a hierarchical reward scheme. We design a parallel trajectory-level success reward and two step-level rewards: Diverse Action Reward and Diverse State Transition Reward, which actively penalize behavioral redundancy and promote broad exploration. Extensive experiments on ALFWorld and ScienceWorld show that DPEPO achieves state-of-the-art (SOTA) success rates, while maintaining comparable efficiency to strong sequential baselines. (Code is available at https://github.com/LePanda026/Code-for-DPEPO)

LGFeb 12
Towards Performance-Enhanced Model-Contrastive Federated Learning using Historical Information in Heterogeneous Scenarios

Hongliang Zhang, Jiguo Yu, Guijuan Wang et al.

Federated Learning (FL) enables multiple nodes to collaboratively train a model without sharing raw data. However, FL systems are usually deployed in heterogeneous scenarios, where nodes differ in both data distributions and participation frequencies, which undermines the FL performance. To tackle the above issue, this paper proposes PMFL, a performance-enhanced model-contrastive federated learning framework using historical training information. Specifically, on the node side, we design a novel model-contrastive term into the node optimization objective by incorporating historical local models to capture stable contrastive points, thereby improving the consistency of model updates in heterogeneous data distributions. On the server side, we utilize the cumulative participation count of each node to adaptively adjust its aggregation weight, thereby correcting the bias in the global objective caused by different node participation frequencies. Furthermore, the updated global model incorporates historical global models to reduce its fluctuations in performance between adjacent rounds. Extensive experiments demonstrate that PMFL achieves superior performance compared with existing FL methods in heterogeneous scenarios.

LGOct 23, 2025
ADP-VRSGP: Decentralized Learning with Adaptive Differential Privacy via Variance-Reduced Stochastic Gradient Push

Xiaoming Wu, Teng Liu, Xin Wang et al.

Differential privacy is widely employed in decentralized learning to safeguard sensitive data by introducing noise into model updates. However, existing approaches that use fixed-variance noise often degrade model performance and reduce training efficiency. To address these limitations, we propose a novel approach called decentralized learning with adaptive differential privacy via variance-reduced stochastic gradient push (ADP-VRSGP). This method dynamically adjusts both the noise variance and the learning rate using a stepwise-decaying schedule, which accelerates training and enhances final model performance while providing node-level personalized privacy guarantees. To counteract the slowed convergence caused by large-variance noise in early iterations, we introduce a progressive gradient fusion strategy that leverages historical gradients. Furthermore, ADP-VRSGP incorporates decentralized push-sum and aggregation techniques, making it particularly suitable for time-varying communication topologies. Through rigorous theoretical analysis, we demonstrate that ADP-VRSGP achieves robust convergence with an appropriate learning rate, significantly improving training stability and speed. Experimental results validate that our method outperforms existing baselines across multiple scenarios, highlighting its efficacy in addressing the challenges of privacy-preserving decentralized learning.

LGJan 21, 2025
Learning Dynamic Representations via An Optimally-Weighted Maximum Mean Discrepancy Optimization Framework for Continual Learning

KaiHui Huang, RunQing Wu, JinHui Shen et al.

Continual learning has emerged as a pivotal area of research, primarily due to its advantageous characteristic that allows models to persistently acquire and retain information. However, catastrophic forgetting can severely impair model performance. In this study, we address network forgetting by introducing a novel framework termed Optimally-Weighted Maximum Mean Discrepancy (OWMMD), which imposes penalties on representation alterations via a Multi-Level Feature Matching Mechanism (MLFMM). Furthermore, we propose an Adaptive Regularization Optimization (ARO) strategy to refine the adaptive weight vectors, which autonomously assess the significance of each feature layer throughout the optimization process, The proposed ARO approach can relieve the over-regularization problem and promote the future task learning. We conduct a comprehensive series of experiments, benchmarking our proposed method against several established baselines. The empirical findings indicate that our approach achieves state-of-the-art performance.

CRJun 30, 2021
Extending On-chain Trust to Off-chain -- Trustworthy Blockchain Data Collection using Trusted Execution Environment (TEE)

Chunchi Liu, Hechuan Guo, Minghui Xu et al.

Blockchain creates a secure environment on top of strict cryptographic assumptions and rigorous security proofs. It permits on-chain interactions to achieve trustworthy properties such as traceability, transparency, and accountability. However, current blockchain trustworthiness is only confined to on-chain, creating a "trust gap" to the physical, off-chain environment. This is due to the lack of a scheme that can truthfully reflect the physical world in a real-time and consistent manner. Such an absence hinders further real-world blockchain applications, especially for security-sensitive ones. In this paper, we propose a scheme to extend blockchain trust from on-chain to off-chain, and take trustworthy vaccine transportation as an example. Our scheme consists of 1) a Trusted Execution Environment (TEE)-enabled trusted environment monitoring system built with the Arm Cortex-M33 microcontroller that continuously senses the inside of a vaccine box through trusted sensors and generates anti-forgery data; and 2) a consistency protocol to upload the environment status data from the TEE system to blockchain in a truthful, real-time consistent, continuous and fault-tolerant fashion. Our security analysis indicates that no adversary can tamper with the vaccine in any way without being captured. We carry out an experiment to record the internal status of a vaccine shipping box during transportation, and the results indicate that the proposed system incurs an average latency of 84 ms in local sensing and processing followed by an average latency of 130 ms to have the sensed data transmitted to and available in the blockchain.

SINov 15, 2020
A Distributed Privacy-Preserving Learning Dynamics in General Social Networks

Youming Tao, Shuzhen Chen, Feng Li et al.

In this paper, we study a distributed privacy-preserving learning problem in social networks with general topology. The agents can communicate with each other over the network, which may result in privacy disclosure, since the trustworthiness of the agents cannot be guaranteed. Given a set of options which yield unknown stochastic rewards, each agent is required to learn the best one, aiming at maximizing the resulting expected average cumulative reward. To serve the above goal, we propose a four-staged distributed algorithm which efficiently exploits the collaboration among the agents while preserving the local privacy for each of them. In particular, our algorithm proceeds iteratively, and in every round, each agent i) randomly perturbs its adoption for the privacy-preserving purpose, ii) disseminates the perturbed adoption over the social network in a nearly uniform manner through random walking, iii) selects an option by referring to the perturbed suggestions received from its peers, and iv) decides whether or not to adopt the selected option as preference according to its latest reward feedback. Through solid theoretical analysis, we quantify the trade-off among the number of agents (or communication overhead), privacy preserving and learning utility. We also perform extensive simulations to verify the efficacy of our proposed social learning algorithm.

CRNov 3, 2020
MalFox: Camouflaged Adversarial Malware Example Generation Based on Conv-GANs Against Black-Box Detectors

Fangtian Zhong, Xiuzhen Cheng, Dongxiao Yu et al.

Deep learning is a thriving field currently stuffed with many practical applications and active research topics. It allows computers to learn from experience and to understand the world in terms of a hierarchy of concepts, with each being defined through its relations to simpler concepts. Relying on the strong capabilities of deep learning, we propose a convolutional generative adversarial network-based (Conv-GAN) framework titled MalFox, targeting adversarial malware example generation against third-party black-box malware detectors. Motivated by the rival game between malware authors and malware detectors, MalFox adopts a confrontational approach to produce perturbation paths, with each formed by up to three methods (namely Obfusmal, Stealmal, and Hollowmal) to generate adversarial malware examples. To demonstrate the effectiveness of MalFox, we collect a large dataset consisting of both malware and benignware programs, and investigate the performance of MalFox in terms of accuracy, detection rate, and evasive rate of the generated adversarial malware examples. Our evaluation indicates that the accuracy can be as high as 99.0% which significantly outperforms the other 12 well-known learning models. Furthermore, the detection rate is dramatically decreased by 56.8% on average, and the average evasive rate is noticeably improved by up to 56.2%.