CVAILGApr 3, 2025

Towards Generalizing Temporal Action Segmentation to Unseen Views

arXiv:2504.02512v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses the challenge of view generalization in action segmentation, which is important for applications like robotics and surveillance, but it is incremental as it builds on existing segmentation methods.

The paper tackles the problem of generalizing temporal action segmentation to unseen camera views by defining a protocol and proposing an approach that uses shared representations at sequence and segment levels, achieving a 12.8% increase in F1@50 for exocentric views and a 54% improvement for egocentric views.

While there has been substantial progress in temporal action segmentation, the challenge to generalize to unseen views remains unaddressed. Hence, we define a protocol for unseen view action segmentation where camera views for evaluating the model are unavailable during training. This includes changing from top-frontal views to a side view or even more challenging from exocentric to egocentric views. Furthermore, we present an approach for temporal action segmentation that tackles this challenge. Our approach leverages a shared representation at both the sequence and segment levels to reduce the impact of view differences during training. We achieve this by introducing a sequence loss and an action loss, which together facilitate consistent video and action representations across different views. The evaluation on the Assembly101, IkeaASM, and EgoExoLearn datasets demonstrate significant improvements, with a 12.8% increase in F1@50 for unseen exocentric views and a substantial 54% improvement for unseen egocentric views.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes