LG CV ROMay 7, 2025

Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation

Abdulaziz Almuzairee, Rohan Patil, Dwait Bhatt, Henrik I. Christensen

arXiv:2505.04619v29.42 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses deployment challenges in robotic manipulation by making multi-view policies more robust to camera failures and easier to deploy, though it is incremental in nature.

The paper tackles the problem of multi-view visual reinforcement learning for robotic manipulation by introducing the Merge And Disentanglement (MAD) algorithm, which improves sample efficiency and robustness, as demonstrated on Meta-World and ManiSkill3 benchmarks.

Vision is well-known for its use in manipulation, especially using visual servoing. Due to the 3D nature of the world, using multiple camera views and merging them creates better representations for Q-learning and in turn, trains more sample efficient policies. Nevertheless, these multi-view policies are sensitive to failing cameras and can be burdensome to deploy. To mitigate these issues, we introduce a Merge And Disentanglement (MAD) algorithm that efficiently merges views to increase sample efficiency while simultaneously disentangling views by augmenting multi-view feature inputs with single-view features. This produces robust policies and allows lightweight deployment. We demonstrate the efficiency and robustness of our approach using Meta-World and ManiSkill3. For project website and code, see https://aalmuzairee.github.io/mad

View on arXiv PDF

Similar