CVAIApr 16, 2024

HumMUSS: Human Motion Understanding using State Space Models

arXiv:2404.10880v111 citationsh-index: 10CVPR
Originality Incremental advance
AI Analysis

This work addresses inefficiencies in real-time and frame-rate generalization for human motion understanding applications, representing an incremental improvement over existing methods.

The paper tackles the limitations of transformer-based models in human motion understanding by proposing an attention-free spatiotemporal model using state space models, which matches transformer performance in tasks like pose estimation and action recognition while being several times faster and more adaptable to different frame rates.

Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition. While state-of-the-art methods predominantly rely on transformer-based architectures, these approaches have limitations in practical scenarios. Transformers are slower when sequentially predicting on a continuous stream of frames in real-time, and do not generalize to new frame rates. In light of these constraints, we propose a novel attention-free spatiotemporal model for human motion understanding building upon recent advancements in state space models. Our model not only matches the performance of transformer-based models in various motion understanding tasks but also brings added benefits like adaptability to different video frame rates and enhanced training speed when working with longer sequence of keypoints. Moreover, the proposed model supports both offline and real-time applications. For real-time sequential prediction, our model is both memory efficient and several times faster than transformer-based approaches while maintaining their high accuracy.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes