CV AIMar 21, 2022

Disentangling Patterns and Transformations from One Sequence of Images with Shape-invariant Lie Group Transformer

T. Takada, W. Shimaya, Y. Ohmura, Y. Kuniyoshi

arXiv:2203.11210v14 citationsh-index: 16

Originality Highly original

AI Analysis

This addresses the challenge of enabling robots to understand complex scenes with minimal data, offering a novel algebraic approach for representation learning.

The paper tackles the problem of learning compositional representations of scenes from a single image sequence, proposing a model that disentangles objects and transformations using shape-invariant Lie group transformers, and demonstrates its ability to discover hidden objects and transformations from one sequence.

An effective way to model the complex real world is to view the world as a composition of basic components of objects and transformations. Although humans through development understand the compositionality of the real world, it is extremely difficult to equip robots with such a learning mechanism. In recent years, there has been significant research on autonomously learning representations of the world using the deep learning; however, most studies have taken a statistical approach, which requires a large number of training data. Contrary to such existing methods, we take a novel algebraic approach for representation learning based on a simpler and more intuitive formulation that the observed world is the combination of multiple independent patterns and transformations that are invariant to the shape of patterns. Since the shape of patterns can be viewed as the invariant features against symmetric transformations such as translation or rotation, we can expect that the patterns can naturally be extracted by expressing transformations with symmetric Lie group transformers and attempting to reconstruct the scene with them. Based on this idea, we propose a model that disentangles the scenes into the minimum number of basic components of patterns and Lie transformations from only one sequence of images, by introducing the learnable shape-invariant Lie group transformers as transformation components. Experiments show that given one sequence of images in which two objects are moving independently, the proposed model can discover the hidden distinct objects and multiple shape-invariant transformations that constitute the scenes.

View on arXiv PDF

Similar