LGNov 29, 2020

Self-supervised Visual Reinforcement Learning with Object-centric Representations

arXiv:2011.14381v156 citations
AI Analysis

This work is significant for autonomous agents operating in complex, multi-object environments by providing a more structured and disentangled observation space for skill acquisition.

This paper addresses the challenge of acquiring diverse skills for autonomous agents from high-dimensional, unstructured observations, particularly in multi-object environments. The authors propose using object-centric representations learned with a compositional generative world model, which, when combined with goal-conditioned attention policies, enables the agent to discover and learn useful skills for compositional tasks.

Autonomous agents need large repertoires of skills to act reasonably on new tasks that they have not seen before. However, acquiring these skills using only a stream of high-dimensional, unstructured, and unlabeled observations is a tricky challenge for any autonomous agent. Previous methods have used variational autoencoders to encode a scene into a low-dimensional vector that can be used as a goal for an agent to discover new skills. Nevertheless, in compositional/multi-object environments it is difficult to disentangle all the factors of variation into such a fixed-length representation of the whole scene. We propose to use object-centric representations as a modular and structured observation space, which is learned with a compositional generative world model. We show that the structure in the representations in combination with goal-conditioned attention policies helps the autonomous agent to discover and learn useful skills. These skills can be further combined to address compositional tasks like the manipulation of several different objects.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes