CVAug 3, 2017

Unsupervised Representation Learning by Sorting Sequences

arXiv:1708.01246v1571 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of learning visual representations without labeled data for researchers and practitioners in computer vision, offering a novel proxy task that is incremental in the context of unsupervised learning.

The paper tackles unsupervised visual representation learning by training a convolutional neural network to sort shuffled video frames, using temporal coherence as a supervisory signal. The method achieves competitive results against state-of-the-art approaches on action recognition, image classification, and object detection tasks.

We present an unsupervised representation learning approach using videos without semantic labels. We leverage the temporal coherence as a supervisory signal by formulating representation learning as a sequence sorting task. We take temporally shuffled frames (i.e., in non-chronological order) as inputs and train a convolutional neural network to sort the shuffled sequences. Similar to comparison-based sorting algorithms, we propose to extract features from all frame pairs and aggregate them to predict the correct order. As sorting shuffled image sequence requires an understanding of the statistical temporal structure of images, training with such a proxy task allows us to learn rich and generalizable visual representation. We validate the effectiveness of the learned representation using our method as pre-training on high-level recognition problems. The experimental results show that our method compares favorably against state-of-the-art methods on action recognition, image classification and object detection tasks.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes