CVAIMLMay 8, 2015

Learning image representations tied to ego-motion

arXiv:1505.02206v2254 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of disconnecting visual learning from physical image sources for applications like autonomous driving and robotics, representing an incremental advance by integrating ego-motion signals into unsupervised feature learning.

The paper tackled the problem of learning visual representations from egocentric video by using proprioceptive motor signals as unsupervised regularization in convolutional neural networks, resulting in features that significantly outperformed previous approaches on visual recognition and next-best-view prediction tasks across three datasets, with improvements demonstrated even for large-scale scene recognition in static images from a disjoint domain.

Understanding how images of objects and scenes behave in response to specific ego-motions is a crucial aspect of proper visual development, yet existing visual learning methods are conspicuously disconnected from the physical source of their images. We propose to exploit proprioceptive motor signals to provide unsupervised regularization in convolutional neural networks to learn visual representations from egocentric video. Specifically, we enforce that our learned features exhibit equivariance i.e. they respond predictably to transformations associated with distinct ego-motions. With three datasets, we show that our unsupervised feature learning approach significantly outperforms previous approaches on visual recognition and next-best-view prediction tasks. In the most challenging test, we show that features learned from video captured on an autonomous driving platform improve large-scale scene recognition in static images from a disjoint domain.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes