CVMar 6

Point-Supervised Skeleton-Based Human Action Segmentation

arXiv:2603.06201v1h-index: 4
Predicted impact top 59% in CV · last 90 daysOriginality Incremental advance
AI Analysis

This work addresses the annotation burden in action segmentation for applications in intelligent systems, representing an incremental improvement over existing methods.

The paper tackles the problem of reducing annotation costs for skeleton-based human action segmentation by proposing a point-supervised framework that requires only one labeled frame per action segment, achieving competitive performance on benchmarks like PKU-MMD and MCFS datasets while surpassing some fully-supervised methods.

Skeleton-based temporal action segmentation is a fundamental yet challenging task, playing a crucial role in enabling intelligent systems to perceive and respond to human activities. While fully-supervised methods achieve satisfactory performance, they require costly frame-level annotations and are sensitive to ambiguous action boundaries. To address these issues, we introduce a point-supervised framework for skeleton-based action segmentation, where only a single frame per action segment is labeled. We leverage multimodal skeleton data, including joint, bone, and motion information, encoded via a pretrained unified model to extract rich feature representations. To generate reliable pseudo-labels, we propose a novel prototype similarity method and integrate it with two existing methods: energy function and constrained K-Medoids clustering. Multimodal pseudo-label integration is proposed to enhance the reliability of the pseudo-label and guide the model training. We establish new benchmarks on PKU-MMD (X-Sub and X-View), MCFS-22, and MCFS-130, and implement baselines for point-supervised skeleton-based human action segmentation. Extensive experiments show that our method achieves competitive performance, even surpassing some fully-supervised methods while significantly reducing annotation effort.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes