Yu-Ren Liu

LG
3papers
60citations
Novelty40%
AI Score23

3 Papers

LGJun 11, 2023
Learning World Models with Identifiable Factorization

Yu-Ren Liu, Biwei Huang, Zhengmao Zhu et al.

Extracting a stable and compact representation of the environment is crucial for efficient reinforcement learning in high-dimensional, noisy, and non-stationary environments. Different categories of information coexist in such environments -- how to effectively extract and disentangle these information remains a challenging problem. In this paper, we propose IFactor, a general framework to model four distinct categories of latent state variables that capture various aspects of information within the RL system, based on their interactions with actions and rewards. Our analysis establishes block-wise identifiability of these latent variables, which not only provides a stable and compact representation but also discloses that all reward-relevant factors are significant for policy learning. We further present a practical approach to learning the world model with identifiable blocks, ensuring the removal of redundants but retaining minimal and sufficient information for policy optimization. Experiments in synthetic worlds demonstrate that our method accurately identifies the ground-truth latent variables, substantiating our theoretical findings. Moreover, experiments in variants of the DeepMind Control Suite and RoboDesk showcase the superior performance of our approach over baselines.

LGMay 18, 2021
Reinforcement Learning With Sparse-Executing Actions via Sparsity Regularization

Jing-Cheng Pang, Tian Xu, Shengyi Jiang et al.

Reinforcement learning (RL) has demonstrated impressive performance in decision-making tasks like embodied control, autonomous driving and financial trading. In many decision-making tasks, the agents often encounter the problem of executing actions under limited budgets. However, classic RL methods typically overlook the challenges posed by such sparse-executing actions. They operate under the assumption that all actions can be taken for a unlimited number of times, both in the formulation of the problem and in the development of effective algorithms. To tackle the issue of limited action execution in RL, this paper first formalizes the problem as a Sparse Action Markov Decision Process (SA-MDP), in which specific actions in the action space can only be executed for a limited time. Then, we propose a policy optimization algorithm, Action Sparsity REgularization (ASRE), which adaptively handles each action with a distinct preference. ASRE operates through two steps: First, ASRE evaluates action sparsity by constrained action sampling. Following this, ASRE incorporates the sparsity evaluation into policy learning by way of an action distribution regularization. We provide theoretical identification that validates the convergence of ASRE to a regularized optimal value function. Experiments on tasks with known sparse-executing actions, where classical RL algorithms struggle to train policy efficiently, ASRE effectively constrains the action sampling and outperforms baselines. Moreover, we present that ASRE can generally improve the performance in Atari games, demonstrating its broad applicability.

LGDec 31, 2017
ZOOpt: Toolbox for Derivative-Free Optimization

Yu-Ren Liu, Yi-Qi Hu, Hong Qian et al.

Recent advances in derivative-free optimization allow efficient approximation of the global-optimal solutions of sophisticated functions, such as functions with many local optima, non-differentiable and non-continuous functions. This article describes the ZOOpt (Zeroth Order Optimization) toolbox that provides efficient derivative-free solvers and is designed easy to use. ZOOpt provides single-machine parallel optimization on the basis of python core and multi-machine distributed optimization for time-consuming tasks by incorporating with the Ray framework -- a famous platform for building distributed applications. ZOOpt particularly focuses on optimization problems in machine learning, addressing high-dimensional and noisy problems such as hyper-parameter tuning and direct policy search. The toolbox is maintained toward a ready-to-use tool in real-world machine learning tasks.