CVSep 5, 2025

DuoCLR: Dual-Surrogate Contrastive Learning for Skeleton-based Human Action Segmentation

arXiv:2509.05543v13 citationsh-index: 3
Originality Incremental advance
AI Analysis

This work addresses action segmentation for video analysis, offering incremental improvements through a tailored pre-training approach.

The paper tackles the problem of human action segmentation from skeleton data by proposing a contrastive learning framework with novel data augmentation and surrogate tasks, resulting in a significant boost over state-of-the-art methods in multi-class and multi-label segmentation tasks.

In this paper, a contrastive representation learning framework is proposed to enhance human action segmentation via pre-training using trimmed (single action) skeleton sequences. Unlike previous representation learning works that are tailored for action recognition and that build upon isolated sequence-wise representations, the proposed framework focuses on exploiting multi-scale representations in conjunction with cross-sequence variations. More specifically, it proposes a novel data augmentation strategy, 'Shuffle and Warp', which exploits diverse multi-action permutations. The latter effectively assists two surrogate tasks that are introduced in contrastive learning: Cross Permutation Contrasting (CPC) and Relative Order Reasoning (ROR). In optimization, CPC learns intra-class similarities by contrasting representations of the same action class across different permutations, while ROR reasons about inter-class contexts by predicting relative mapping between two permutations. Together, these tasks enable a Dual-Surrogate Contrastive Learning (DuoCLR) network to learn multi-scale feature representations optimized for action segmentation. In experiments, DuoCLR is pre-trained on a trimmed skeleton dataset and evaluated on an untrimmed dataset where it demonstrates a significant boost over state-the-art comparatives in both multi-class and multi-label action segmentation tasks. Lastly, ablation studies are conducted to evaluate the effectiveness of each component of the proposed approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes