AdaTracker: Learning Adaptive In-Context Policy for Cross-Embodiment Active Visual Tracking
This work addresses the problem of poor scalability and limited generalization in active visual tracking for robotics, offering a solution that is incremental by building on existing policy learning methods with novel adaptations for embodiment-specific constraints.
The paper tackles the challenge of active visual tracking across diverse robots with varying physical constraints by proposing AdaTracker, an adaptive in-context policy learning framework that enables zero-shot adaptation to unseen embodiments, significantly outperforming state-of-the-art methods in cross-embodiment generalization, sample efficiency, and zero-shot adaptation.
Realizing active visual tracking with a single unified model across diverse robots is challenging, as the physical constraints and motion dynamics vary drastically from one platform to another. Existing approaches typically train separate models for each embodiment, leading to poor scalability and limited generalization. To address this, we propose AdaTracker, an adaptive in-context policy learning framework that robustly tracks targets on diverse robot morphologies. Our key insight is to explicitly model embodiment-specific constraints through an Embodiment Context Encoder, which infers embodiment-specific constraints from history. This contextual representation dynamically modulates a Context-Aware Policy, enabling it to infer optimal control actions for unseen embodiments in a zero-shot manner. To enhance robustness, we introduce two auxiliary objectives to ensure accurate context identification and temporal consistency. Experiments in both simulation and the real world demonstrate that AdaTracker significantly outperforms state-of-the-art methods in cross-embodiment generalization, sample efficiency, and zero-shot adaptation.