CVDec 2, 2021

Iterative Contrast-Classify For Semi-supervised Temporal Action Segmentation

arXiv:2112.01402v236 citations
Originality Incremental advance
AI Analysis

This addresses the labeling bottleneck for video analysis researchers, offering a novel semi-supervised approach that is incremental in combining existing techniques.

The paper tackles the problem of high labeling cost in temporal action segmentation by proposing the first semi-supervised method, which uses unsupervised representation learning and an iterative scheme to achieve performance similar to fully-supervised methods with only 40% labeled videos, improving MoF by up to +5.6% on standard datasets.

Temporal action segmentation classifies the action of each frame in (long) video sequences. Due to the high cost of frame-wise labeling, we propose the first semi-supervised method for temporal action segmentation. Our method hinges on unsupervised representation learning, which, for temporal action segmentation, poses unique challenges. Actions in untrimmed videos vary in length and have unknown labels and start/end times. Ordering of actions across videos may also vary. We propose a novel way to learn frame-wise representations from temporal convolutional networks (TCNs) by clustering input features with added time-proximity condition and multi-resolution similarity. By merging representation learning with conventional supervised learning, we develop an "Iterative-Contrast-Classify (ICC)" semi-supervised learning scheme. With more labelled data, ICC progressively improves in performance; ICC semi-supervised learning, with 40% labelled videos, performs similar to fully-supervised counterparts. Our ICC improves MoF by {+1.8, +5.6, +2.5}% on Breakfast, 50Salads and GTEA respectively for 100% labelled videos.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes