AIDec 5, 2020

Joint Estimation of Image Representations and their Lie Invariants

Christine Allen-Blanchette, Kostas Daniilidis

arXiv:2012.02903v22.3

Originality Highly original

AI Analysis

This work addresses the challenge of automatically extracting disentangled information from high-dimensional image representations, which is a foundational problem for computer vision and robotics.

This paper tackles the problem of disentangling world state and content from image representations for tasks like planning and control or classification. It proposes two theoretical approaches that jointly estimate image representations and the generators of sequence dynamics, enabling interpolation and extrapolation of images from a sequence.

Images encode both the state of the world and its content. The former is useful for tasks such as planning and control, and the latter for classification. The automatic extraction of this information is challenging because of the high-dimensionality and entangled encoding inherent to the image representation. This article introduces two theoretical approaches aimed at the resolution of these challenges. The approaches allow for the interpolation and extrapolation of images from an image sequence by joint estimation of the image representation and the generators of the sequence dynamics. In the first approach, the image representations are learned using probabilistic PCA \cite{tipping1999probabilistic}. The linear-Gaussian conditional distributions allow for a closed form analytical description of the latent distributions but assumes the underlying image manifold is a linear subspace. In the second approach, the image representations are learned using probabilistic nonlinear PCA which relieves the linear manifold assumption at the cost of requiring a variational approximation of the latent distributions. In both approaches, the underlying dynamics of the image sequence are modelled explicitly to disentangle them from the image representations. The dynamics themselves are modelled with Lie group structure which enforces the desirable properties of smoothness and composability of inter-image transformations.

View on arXiv PDF

Similar