CVDec 7, 2021

ViewCLR: Learning Self-supervised Video Representation for Unseen Viewpoints

arXiv:2112.03905v111.136 citations

Originality Incremental advance

AI Analysis

This addresses a key limitation in video understanding for applications like robotics or surveillance, though it is incremental as it builds on existing self-supervised methods.

The paper tackles the problem of self-supervised video representations failing to generalize to unseen camera viewpoints by proposing ViewCLR, which learns viewpoint-invariant representations, achieving state-of-the-art results on cross-view benchmarks like NTU RGB+D.

Learning self-supervised video representation predominantly focuses on discriminating instances generated from simple data augmentation schemes. However, the learned representation often fails to generalize over unseen camera viewpoints. To this end, we propose ViewCLR, that learns self-supervised video representation invariant to camera viewpoint changes. We introduce a view-generator that can be considered as a learnable augmentation for any self-supervised pre-text tasks, to generate latent viewpoint representation of a video. ViewCLR maximizes the similarities between the latent viewpoint representation with its representation from the original viewpoint, enabling the learned video encoder to generalize over unseen camera viewpoints. Experiments on cross-view benchmark datasets including NTU RGB+D dataset show that ViewCLR stands as a state-of-the-art viewpoint invariant self-supervised method.

View on arXiv PDF

Similar