CVMar 9, 2023

TAEC: Unsupervised Action Segmentation with Temporal-Aware Embedding and Clustering

IBMMIT
arXiv:2303.05166v13 citationsh-index: 88
Originality Incremental advance
AI Analysis

This addresses the high cost of manual annotation for video action segmentation, though it is an incremental improvement over existing unsupervised methods.

The paper tackles unsupervised action segmentation in untrimmed videos by proposing a temporal embedding network and clustering pipeline, achieving state-of-the-art results on three challenging datasets.

Temporal action segmentation in untrimmed videos has gained increased attention recently. However, annotating action classes and frame-wise boundaries is extremely time consuming and cost intensive, especially on large-scale datasets. To address this issue, we propose an unsupervised approach for learning action classes from untrimmed video sequences. In particular, we propose a temporal embedding network that combines relative time prediction, feature reconstruction, and sequence-to-sequence learning, to preserve the spatial layout and sequential nature of the video features. A two-step clustering pipeline on these embedded feature representations then allows us to enforce temporal consistency within, as well as across videos. Based on the identified clusters, we decode the video into coherent temporal segments that correspond to semantically meaningful action classes. Our evaluation on three challenging datasets shows the impact of each component and, furthermore, demonstrates our state-of-the-art unsupervised action segmentation results.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes