CVApr 16

Unsupervised Skeleton-Based Action Segmentation via Hierarchical Spatiotemporal Vector Quantization

arXiv:2604.1519656.6h-index: 21
Predicted impact top 62% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the problem of unsupervised temporal action segmentation from skeleton data, which is important for activity understanding in domains like healthcare and robotics, and achieves a notable improvement over prior methods.

The paper introduces a hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation, achieving new state-of-the-art performance on HuGaDB, LARa, and BABEL benchmarks while reducing segment length bias.

We propose a novel hierarchical spatiotemporal vector quantization framework for unsupervised skeleton-based temporal action segmentation. We first introduce a hierarchical approach, which includes two consecutive levels of vector quantization. Specifically, the lower level associates skeletons with fine-grained subactions, while the higher level further aggregates subactions into action-level representations. Our hierarchical approach outperforms the non-hierarchical baseline, while primarily exploiting spatial cues by reconstructing input skeletons. Next, we extend our approach by leveraging both spatial and temporal information, yielding a hierarchical spatiotemporal vector quantization scheme. In particular, our hierarchical spatiotemporal approach performs multi-level clustering, while simultaneously recovering input skeletons and their corresponding timestamps. Lastly, extensive experiments on multiple benchmarks, including HuGaDB, LARa, and BABEL, demonstrate that our approach establishes a new state-of-the-art performance and reduces segment length bias in unsupervised skeleton-based temporal action segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes