CVJul 2, 2022

Turning to a Teacher for Timestamp Supervised Temporal Action Segmentation

arXiv:2207.00712v16 citationsh-index: 20
Originality Incremental advance
AI Analysis

This work addresses a specific bottleneck in video action segmentation for researchers and practitioners by improving efficiency and stability in weakly supervised learning.

The paper tackles the problem of noise and instability in timestamp-supervised temporal action segmentation by introducing a teacher model to stabilize pseudo-label generation and a segmentally smoothing loss, achieving state-of-the-art performance on three datasets with comparable results to fully-supervised methods at lower annotation cost.

Temporal action segmentation in videos has drawn much attention recently. Timestamp supervision is a cost-effective way for this task. To obtain more information to optimize the model, the existing method generated pseudo frame-wise labels iteratively based on the output of a segmentation model and the timestamp annotations. However, this practice may introduce noise and oscillation during the training, and lead to performance degeneration. To address this problem, we propose a new framework for timestamp supervised temporal action segmentation by introducing a teacher model parallel to the segmentation model to help stabilize the process of model optimization. The teacher model can be seen as an ensemble of the segmentation model, which helps to suppress the noise and to improve the stability of pseudo labels. We further introduce a segmentally smoothing loss, which is more focused and cohesive, to enforce the smooth transition of the predicted probabilities within action instances. The experiments on three datasets show that our method outperforms the state-of-the-art method and performs comparably against the fully-supervised methods at a much lower annotation cost.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes