Learning Quadruped Walking from Seconds of Demonstration
This work addresses the challenge of efficiently training quadruped locomotion policies for robotics researchers, offering a method that significantly reduces data requirements.
This paper explores why imitation learning is effective for quadruped locomotion with small datasets, attributing it to the structure of limit cycles and neural network properties. They propose a new imitation learning method that aligns latent space variations with output actions, enabling the training of robust locomotion policies from a few seconds of offline demonstration.
Quadruped locomotion provides a natural setting for understanding when model-free learning can outperform model-based control design, by exploiting data patterns to bypass the difficulty of optimizing over discrete contacts and the combinatorial explosion of mode changes. We give a principled analysis of why imitation learning with quadrupeds can be inherently effective in a small data regime, based on the structure of its limit cycles, Poincaré return maps, and local numerical properties of neural networks. The understanding motivates a new imitation learning method that regulates the alignment between variations in a latent space and those over the output actions. Hardware experiments confirm that a few seconds of demonstration is sufficient to train various locomotion policies from scratch entirely offline with reasonable robustness.