CVJul 8, 2022

Pixel-level Correspondence for Self-Supervised Learning from Video

arXiv:2207.03866v16 citationsh-index: 98
Originality Incremental advance
AI Analysis

This addresses the challenge of leveraging video for self-supervised learning in computer vision, offering a method that improves dense prediction tasks, though it appears incremental as it builds on existing contrastive learning and optical flow techniques.

The paper tackles the problem of learning dense visual representations from video without labels by proposing Pixel-level Correspondence (PiCo), which uses optical flow tracking to match local features across time, resulting in outperforming self-supervised baselines on multiple dense prediction tasks while maintaining image classification performance.

While self-supervised learning has enabled effective representation learning in the absence of labels, for vision, video remains a relatively untapped source of supervision. To address this, we propose Pixel-level Correspondence (PiCo), a method for dense contrastive learning from video. By tracking points with optical flow, we obtain a correspondence map which can be used to match local features at different points in time. We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks, without compromising performance on image classification.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes