CVLGIVApr 1, 2024

Temporally Consistent Unbalanced Optimal Transport for Unsupervised Action Segmentation

arXiv:2404.01518v338 citationsh-index: 5CVPR
Originality Highly original
AI Analysis

This addresses the problem of segmenting actions in untrimmed videos without requiring known action orders, which is incremental but improves unsupervised learning pipelines.

The paper tackles unsupervised action segmentation in long videos by proposing a novel optimal transport approach with temporal consistency, achieving state-of-the-art results on multiple datasets.

We propose a novel approach to the action segmentation task for long, untrimmed videos, based on solving an optimal transport problem. By encoding a temporal consistency prior into a Gromov-Wasserstein problem, we are able to decode a temporally consistent segmentation from a noisy affinity/matching cost matrix between video frames and action classes. Unlike previous approaches, our method does not require knowing the action order for a video to attain temporal consistency. Furthermore, our resulting (fused) Gromov-Wasserstein problem can be efficiently solved on GPUs using a few iterations of projected mirror descent. We demonstrate the effectiveness of our method in an unsupervised learning setting, where our method is used to generate pseudo-labels for self-training. We evaluate our segmentation approach and unsupervised learning pipeline on the Breakfast, 50-Salads, YouTube Instructions and Desktop Assembly datasets, yielding state-of-the-art results for the unsupervised video action segmentation task.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes