LoopSR: Looping Sim-and-Real for Lifelong Policy Adaptation of Legged Robots
This addresses the challenge of maintaining high performance in specific environments for legged robots, though it appears incremental as it builds on existing sim-to-real and adaptation methods.
The paper tackles the problem of suboptimal real-world deployment of legged robot policies due to trade-offs in sim-to-real transfer, proposing LoopSR, a lifelong adaptation framework that continuously refines policies post-deployment, achieving superior data efficiency and eminent performance in experiments.
Reinforcement Learning (RL) has shown its remarkable and generalizable capability in legged locomotion through sim-to-real transfer. However, while adaptive methods like domain randomization are expected to enhance policy robustness across diverse environments, they potentially compromise the policy's performance in any specific environment, leading to suboptimal real-world deployment due to the No Free Lunch theorem. To address this, we propose LoopSR, a lifelong policy adaptation framework that continuously refines RL policies in the post-deployment stage. LoopSR employs a transformer-based encoder to map real-world trajectories into a latent space and reconstruct a digital twin of the real world for further improvement. Autoencoder architecture and contrastive learning methods are adopted to enhance feature extraction of real-world dynamics. Simulation parameters for continual training are derived by combining predicted values from the decoder with retrieved parameters from a pre-collected simulation trajectory dataset. By leveraging simulated continual training, LoopSR achieves superior data efficiency compared with strong baselines, yielding eminent performance with limited data in both sim-to-sim and sim-to-real experiments. Please refer to https://peilinwu.site/looping-sim-and-real.github.io/ for videos and code.