Efficient Training of Generalizable Visuomotor Policies via Control-Aware Augmentation
This work addresses generalization issues in robotics and AI for deployment in diverse environments, representing an incremental improvement over existing data augmentation methods.
The paper tackles the challenge of improving generalization in embodied AI by introducing EAGLE, a training framework that uses control-aware augmentation and knowledge distillation to enhance visuomotor policies, achieving strong performance on benchmarks like DMControl and robot manipulation tasks.
Improving generalization is one key challenge in embodied AI, where obtaining large-scale datasets across diverse scenarios is costly. Traditional weak augmentations, such as cropping and flipping, are insufficient for improving a model's performance in new environments. Existing data augmentation methods often disrupt task-relevant information in images, potentially degrading performance. To overcome these challenges, we introduce EAGLE, an efficient training framework for generalizable visuomotor policies that improves upon existing methods by (1) enhancing generalization by applying augmentation only to control-related regions identified through a self-supervised control-aware mask and (2) improving training stability and efficiency by distilling knowledge from an expert to a visuomotor student policy, which is then deployed to unseen environments without further fine-tuning. Comprehensive experiments on three domains, including the DMControl Generalization Benchmark, the enhanced Robot Manipulation Distraction Benchmark, and a long-sequential drawer-opening task, validate the effectiveness of our method.