Pose-Aware Multi-Level Motion Parsing for Action Quality Assessment
This work addresses the problem of automated scoring in sports competitions, such as diving, by capturing subtle pose variations, but it is incremental as it builds on existing pose-based methods with added modules for flexibility.
The paper tackles action quality assessment in sports by proposing a multi-level motion parsing framework that leverages enhanced spatial-temporal pose features, achieving state-of-the-art performance in action segmentation and scoring on large-scale diving datasets.
Human pose serves as a cornerstone of action quality assessment (AQA), where subtle spatial-temporal variations in pose often distinguish excellence from mediocrity. In high-level competitions, these nuanced differences become decisive factors in scoring. In this paper, we propose a novel multi-level motion parsing framework for AQA based on enhanced spatial-temporal pose features. On the first level, the Action-Unit Parser is designed with the help of pose extraction to achieve precise action segmentation and comprehensive local-global pose representations. On the second level, Motion Parser is used by spatial-temporal feature learning to capture pose changes and appearance details for each action-unit. Meanwhile, some special conditions other than body-related will impact action scoring, like water splash in diving. In this work, we design an additional Condition Parser to offer users more flexibility in their choices. Finally, Weight-Adjust Scoring Module is introduced to better accommodate the diverse requirements of various action types and the multi-scale nature of action-units. Extensive evaluations on large-scale diving sports datasets demonstrate that our multi-level motion parsing framework achieves state-of-the-art performance in both action segmentation and action scoring tasks.