ROGRLGAug 14, 2023

Adaptive Tracking of a Single-Rigid-Body Character in Various Environments

arXiv:2308.07491v37 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work addresses the challenge of creating sample-efficient and adaptive motion controllers for simulated characters, which is incremental as it builds on prior methods like DeepMimic by introducing a simplified model.

The paper tackles the problem of generating adaptive full-body character motions in various environments by simulating a single-rigid-body character using centroidal dynamics and training a policy with deep reinforcement learning to track reference motions, resulting in a policy that adapts to unobserved environmental changes and controller transitions without additional learning, trained efficiently in 30 minutes on a laptop.

Since the introduction of DeepMimic [Peng et al. 2018], subsequent research has focused on expanding the repertoire of simulated motions across various scenarios. In this study, we propose an alternative approach for this goal, a deep reinforcement learning method based on the simulation of a single-rigid-body character. Using the centroidal dynamics model (CDM) to express the full-body character as a single rigid body (SRB) and training a policy to track a reference motion, we can obtain a policy that is capable of adapting to various unobserved environmental changes and controller transitions without requiring any additional learning. Due to the reduced dimension of state and action space, the learning process is sample-efficient. The final full-body motion is kinematically generated in a physically plausible way, based on the state of the simulated SRB character. The SRB simulation is formulated as a quadratic programming (QP) problem, and the policy outputs an action that allows the SRB character to follow the reference motion. We demonstrate that our policy, efficiently trained within 30 minutes on an ultraportable laptop, has the ability to cope with environments that have not been experienced during learning, such as running on uneven terrain or pushing a box, and transitions between learned policies, without any additional learning.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes