ROCVLGApr 1, 2024

Entity-Centric Reinforcement Learning for Object Manipulation from Pixels

arXiv:2404.01220v134 citationsh-index: 39Has CodeICLR
Originality Highly original
AI Analysis

This addresses the curse of dimensionality in visual RL for robotics, enabling more complex object manipulation tasks, though it is incremental in improving generalization.

The paper tackles the challenge of scaling reinforcement learning for object manipulation from raw images by proposing an entity-centric approach that handles multiple objects and their dependencies, achieving agents that learn with 3 objects and generalize to tasks with over 10 objects.

Manipulating objects is a hallmark of human intelligence, and an important task in domains such as robotics. In principle, Reinforcement Learning (RL) offers a general approach to learn object manipulation. In practice, however, domains with more than a few objects are difficult for RL agents due to the curse of dimensionality, especially when learning from raw image observations. In this work we propose a structured approach for visual RL that is suitable for representing multiple objects and their interaction, and use it to learn goal-conditioned manipulation of several objects. Key to our method is the ability to handle goals with dependencies between the objects (e.g., moving objects in a certain order). We further relate our architecture to the generalization capability of the trained agent, based on a theoretical result for compositional generalization, and demonstrate agents that learn with 3 objects but generalize to similar tasks with over 10 objects. Videos and code are available on the project website: https://sites.google.com/view/entity-centric-rl

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes