Xinyang Gu

LG
h-index9
5papers
234citations
Novelty59%
AI Score31

5 Papers

ROAug 26, 2024
Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning

Xinyang Gu, Yen-Jen Wang, Xiang Zhu et al.

Humanoid robots, with their human-like skeletal structure, are especially suited for tasks in human-centric environments. However, this structure is accompanied by additional challenges in locomotion controller design, especially in complex real-world environments. As a result, existing humanoid robots are limited to relatively simple terrains, either with model-based control or model-free reinforcement learning. In this work, we introduce Denoising World Model Learning (DWL), an end-to-end reinforcement learning framework for humanoid locomotion control, which demonstrates the world's first humanoid robot to master real-world challenging terrains such as snowy and inclined land in the wild, up and down stairs, and extremely uneven terrains. All scenarios run the same learned neural network with zero-shot sim-to-real transfer, indicating the superior robustness and generalization capability of the proposed method.

ROApr 8, 2024
Humanoid-Gym: Reinforcement Learning for Humanoid Robot with Zero-Shot Sim2Real Transfer

Xinyang Gu, Yen-Jen Wang, Jianyu Chen

Humanoid-Gym is an easy-to-use reinforcement learning (RL) framework based on Nvidia Isaac Gym, designed to train locomotion skills for humanoid robots, emphasizing zero-shot transfer from simulation to the real-world environment. Humanoid-Gym also integrates a sim-to-sim framework from Isaac Gym to Mujoco that allows users to verify the trained policies in different physical simulations to ensure the robustness and generalization of the policies. This framework is verified by RobotEra's XBot-S (1.2-meter tall humanoid robot) and XBot-L (1.65-meter tall humanoid robot) in a real-world environment with zero-shot sim-to-real transfer. The project website and source code can be found at: https://sites.google.com/view/humanoid-gym/.

LGDec 20, 2019
Soft Q Network

Jingbin Liu, Shuai Liu, Xinyang Gu

Deep Q Network (DQN) is a very successful algorithm, yet the inherent problem of reinforcement learning, i.e. the exploit-explore balance, remains. In this work, we introduce entropy regularization into DQN and propose SQN. We find that the backup equation of soft Q learning can enjoy the corrective feedback if we view the soft backup as policy improvement in the form of Q, instead of policy evaluation. We show that Soft Q Learning with Corrective Feedback (SQL-CF) underlies the on-plicy nature of SQL and the equivalence of SQL and Soft Policy Gradient (SPG). With these insights, we propose an on-policy version of deep Q learning algorithm, i.e. Q On-Policy (QOP). We experiment with QOP on a self-play environment called Google Research Football (GRF). The QOP algorithm exhibits great stability and efficiency in training GRF agents.

LGDec 2, 2019
Policy Optimization Reinforcement Learning with Entropy Regularization

Jingbin Liu, Xinyang Gu, Shuai Liu

Entropy regularization is an important idea in reinforcement learning, with great success in recent algorithms like Soft Q Network (SQN) and Soft Actor-Critic (SAC1). In this work, we extend this idea into the on-policy realm. We propose the soft policy gradient theorem (SPGT) for on-policy maximum entropy reinforcement learning. With SPGT, a series of new policy optimization algorithms are derived, such as SPG, SA2C, SA3C, SDDPG, STRPO, SPPO, SIMPALA and so on. We find that SDDPG is equivalent to SAC1. For policy gradient, the policy network is often represented as a Gaussian distribution with a global action variance, which damages the representation capacity. We introduce a local action variance for policy network and find it can work collaboratively with the idea of entropy regularization. Our method outperforms prior works on a range of benchmark tasks. Furthermore, our method can be easily extended to large scale experiment with great stability and parallelism.

AIAug 30, 2019
Reinforcement learning with world model

Jingbin Liu, Xinyang Gu, Shuai Liu

Nowadays, model-free reinforcement learning algorithms have achieved remarkable performance on many decision making and control tasks, but high sample complexity and low sample efficiency still hinder the wide use of model-free reinforcement learning algorithms. In this paper, we argue that if we intend to design an intelligent agent that learns fast and transfers well, the agent must be able to reflect key elements of intelligence, like intuition, Memory, PredictionandCuriosity. We propose an agent framework that integrates off-policy reinforcement learning with world model learning, so as to embody the important features of intelligence in our algorithm design. We adopt the state-of-art model-free reinforcement learning algorithm, Soft Actor-Critic, as the agent intuition, and world model learning through RNN to endow the agent with memory, curiosity, and the ability to predict. We show that these ideas can work collaboratively with each other and our agent (RMC) can give new state-of-art results while maintaining sample efficiency and training stability. Moreover, our agent framework can be easily extended from MDP to POMDP problems without performance loss.