Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
This work addresses deployment challenges in robotic manipulation by making multi-view policies more robust to camera failures and easier to deploy, though it is incremental in nature.
The paper tackles the problem of multi-view visual reinforcement learning for robotic manipulation by introducing the Merge And Disentanglement (MAD) algorithm, which improves sample efficiency and robustness, as demonstrated on Meta-World and ManiSkill3 benchmarks.
Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more sample efficient policies. Nevertheless, these multi-view policies are sensitive to failing cameras and can be burdensome to deploy. To mitigate these issues, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while simultaneously disentangling views by augmenting multi-view feature inputs with single-view features. This produces robust policies and allows lightweight deployment. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad