Observer-Actor: Active Vision Imitation Learning with Sparse-View Gaussian Splatting
This addresses the challenge of occlusion and poor visibility in robotic manipulation tasks, enabling more robust ambidextrous policies, though it is incremental as it builds on existing imitation learning methods.
The paper tackles the problem of active vision imitation learning in robotics by proposing the Observer-Actor framework, which dynamically optimizes camera poses to improve observation clarity, resulting in significant performance gains: trajectory transfer improved by 145% without occlusion and 233% with occlusion, and behavior cloning by 75% and 143%.
We propose Observer Actor (ObAct), a novel framework for active vision imitation learning in which the observer moves to optimal visual observations for the actor. We study ObAct on a dual-arm robotic system equipped with wrist-mounted cameras. At test time, ObAct dynamically assigns observer and actor roles: the observer arm constructs a 3D Gaussian Splatting (3DGS) representation from three images, virtually explores this to find an optimal camera pose, then moves to this pose; the actor arm then executes a policy using the observer's observations. This formulation enhances the clarity and visibility of both the object and the gripper in the policy's observations. As a result, we enable the training of ambidextrous policies on observations that remain closer to the occlusion-free training distribution, leading to more robust policies. We study this formulation with two existing imitation learning methods -- trajectory transfer and behavior cloning -- and experiments show that ObAct significantly outperforms static-camera setups: trajectory transfer improves by 145% without occlusion and 233% with occlusion, while behavior cloning improves by 75% and 143%, respectively. Videos are available at https://obact.github.io.