CVMay 7

Jointly Learning Structured Representations and Stabilized Affinity for Human Motion Segmentation

arXiv:2605.0575352.4h-index: 6
Predicted impact top 67% in CV · last 90 daysOriginality Incremental advance
AI Analysis

For researchers in video analysis and motion segmentation, this work addresses the violation of the union-of-subspaces assumption in real-world videos, offering a more robust and accurate method.

The paper tackles human motion segmentation (HMS) by proposing TDSC, which jointly learns structured representations and stabilized affinity. On five benchmark datasets, TDSC achieves state-of-the-art performance, e.g., 98.2% F-measure on the MoSeg dataset with DINOv2 features, outperforming prior methods by up to 5%.

Human Motion Segmentation (HMS), which aims to partition a video into non-overlapping segments corresponding to different human motions, has recently attracted increasing research attention. Existing HMS approaches are predominantly based on subspace clustering, which are grounded on the assumption that the distribution of high-dimensional temporal features well aligns with a Union-of-Subspaces (UoS). For videos in the real world, however, the raw frame-level features often violate the UoS assumption and yield unsatisfactory segmentation performance. To address this issue, we propose an efficient and effective approach for HMS, named Temporal Deep Self-expressive subspace Clustering (TDSC), which jointly learns temporally consistent structured representations and stabilized affinity for accurate and robust HMS. Specifically, in TDSC, we alternately learn structured representations of the input frame features and self-expressive coefficients via a properly regularized self-expressive model, in which a coding-rate maximization regularizer is incorporated to avoid representation collapse and conform the learned representations to span a desired UoS distribution, and meanwhile, temporal constraints are incorporated to promote temporally adjacent frames to be partitioned into the same groups. Moreover, we develop a temporal momentum averaging mechanism to stabilize affinity evolution and design a reparameterization strategy to enable efficient optimization. We conduct extensive experiments on five benchmark HMS datasets using both conventional (HoG) and up-to-date deep features (i.e., CLIP, DINOv2) to validate the effectiveness of our approach.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes