CVNov 9, 2018

Cross and Learn: Cross-Modal Self-Supervision

arXiv:1811.03879v386 citationsHas Code
Originality Incremental advance
AI Analysis

This work addresses the need for effective self-supervised learning methods in computer vision, particularly for action recognition, but appears incremental as it builds on existing cross-modal approaches.

The paper tackles the problem of self-supervised representation learning by leveraging cross-modal information from video data, achieving state-of-the-art performance on action recognition datasets.

In this paper we present a self-supervised method for representation learning utilizing two different modalities. Based on the observation that cross-modal information has a high semantic meaning we propose a method to effectively exploit this signal. For our approach we utilize video data since it is available on a large scale and provides easily accessible modalities given by RGB and optical flow. We demonstrate state-of-the-art performance on highly contested action recognition datasets in the context of self-supervised learning. We show that our feature representation also transfers to other tasks and conduct extensive ablation studies to validate our core contributions. Code and model can be found at https://github.com/nawidsayed/Cross-and-Learn.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes