Xudong Gong

11.5LGJan 11, 2024

Optimistic Model Rollouts for Pessimistic Offline Policy Optimization

Yuanzhao Zhai, Yiying Li, Zijian Gao et al.

Model-based offline reinforcement learning (RL) has made remarkable progress, offering a promising avenue for improving generalization with synthetic model rollouts. Existing works primarily focus on incorporating pessimism for policy optimization, usually via constructing a Pessimistic Markov Decision Process (P-MDP). However, the P-MDP discourages the policies from learning in out-of-distribution (OOD) regions beyond the support of offline datasets, which can under-utilize the generalization ability of dynamics models. In contrast, we propose constructing an Optimistic MDP (O-MDP). We initially observed the potential benefits of optimism brought by encouraging more OOD rollouts. Motivated by this observation, we present ORPO, a simple yet effective model-based offline RL framework. ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization. Specifically, we train an optimistic rollout policy in the O-MDP to sample more OOD model rollouts. Then we relabel the sampled state-action pairs with penalized rewards and optimize the output policy in the P-MDP. Theoretically, we demonstrate that the performance of policies trained with ORPO can be lower-bounded in linear MDPs. Experimental results show that our framework significantly outperforms P-MDP baselines by a margin of 30%, achieving state-of-the-art performance on the widely-used benchmark. Moreover, ORPO exhibits notable advantages in problems that require generalization.

3.7CRJul 11, 2013

A Secure Distributed Authentication scheme based on CRT-VSS and Trusted Computing in MANET

Qiwei Lu, Wenchao Huang, Xudong Gong et al.

With the rapid development of MANET, secure and practical authentication is becoming increasingly important. The existing works perform the research from two aspects, i.e., (a)secure key division and distributed storage, (b)secure distributed authentication. But there still exist several unsolved problems. Specifically, it may suffer from cheating problems and fault authentication attack, which can result in authentication failure and DoS attack towards authentication service. Besides, most existing schemes are not with satisfactory efficiency due to exponential arithmetic based on Shamir's scheme. In this paper, we explore the property of verifiable secret sharing(VSS) schemes with Chinese Remainder Theorem (CRT), then propose a secret key distributed storage scheme based on CRT-VSS and trusted computing for MANET. Specifically, we utilize trusted computing technology to solve two existing cheating problems in secret sharing area before. After that, we do the analysis of homomorphism property with CRT-VSS and design the corresponding shares-product sharing scheme with better concision. On such basis, a secure distributed Elliptic Curve-Digital Signature Standard signature (ECC-DSS) authentication scheme based on CRT-VSS scheme and trusted computing is proposed. Furthermore, as an important property of authentication scheme, we discuss the refreshing property of CRT-VSS and do thorough comparisons with Shamir's scheme. Finally, we provide formal guarantees towards our schemes proposed in this paper.

Xudong Gong

2 Papers