CVMay 11, 2021

Representation Learning via Global Temporal Alignment and Cycle-Consistency

arXiv:2105.05217v166 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of learning robust representations from temporal data for applications like video analysis, though it appears incremental as it builds on existing alignment techniques.

The paper tackles representation learning by aligning temporal sequences like videos of the same process, using a weakly supervised method based on global temporal ordering and cycle-consistency, resulting in significant performance increases in tasks such as fine-grained action classification and few-shot learning.

We introduce a weakly supervised method for representation learning based on aligning temporal sequences (e.g., videos) of the same process (e.g., human action). The main idea is to use the global temporal ordering of latent correspondences across sequence pairs as a supervisory signal. In particular, we propose a loss based on scoring the optimal sequence alignment to train an embedding network. Our loss is based on a novel probabilistic path finding view of dynamic time warping (DTW) that contains the following three key features: (i) the local path routing decisions are contrastive and differentiable, (ii) pairwise distances are cast as probabilities that are contrastive as well, and (iii) our formulation naturally admits a global cycle consistency loss that verifies correspondences. For evaluation, we consider the tasks of fine-grained action classification, few shot learning, and video synchronization. We report significant performance increases over previous methods. In addition, we report two applications of our temporal alignment framework, namely 3D pose reconstruction and fine-grained audio/visual retrieval.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes