CVMar 24, 2025

Cost-Sensitive Learning for Long-Tailed Temporal Action Segmentation

Zhanzhong Pang, Fadime Sener, Shrinivas Ramasubramanian, Angela Yao

arXiv:2503.18358v12 citationsh-index: 14BMVC

Originality Incremental advance

AI Analysis

This work addresses the challenge of long-tailed action distributions in procedural videos, which is an incremental improvement for video analysis tasks.

The paper tackles the problem of long-tailed distributions in temporal action segmentation by addressing class-level and transition-level biases, resulting in significant improvements in per-class frame-wise and segment-wise performance on three benchmarks.

Temporal action segmentation in untrimmed procedural videos aims to densely label frames into action classes. These videos inherently exhibit long-tailed distributions, where actions vary widely in frequency and duration. In temporal action segmentation approaches, we identified a bi-level learning bias. This bias encompasses (1) a class-level bias, stemming from class imbalance favoring head classes, and (2) a transition-level bias arising from variations in transitions, prioritizing commonly observed transitions. As a remedy, we introduce a constrained optimization problem to alleviate both biases. We define learning states for action classes and their associated transitions and integrate them into the optimization process. We propose a novel cost-sensitive loss function formulated as a weighted cross-entropy loss, with weights adaptively adjusted based on the learning state of actions and their transitions. Experiments on three challenging temporal segmentation benchmarks and various frameworks demonstrate the effectiveness of our approach, resulting in significant improvements in both per-class frame-wise and segment-wise performance.

View on arXiv PDF

Similar