EgoExo-Fitness: Towards Egocentric and Exocentric Full-Body Action Understanding
This provides a dataset for researchers in computer vision and human-computer interaction to study egocentric and exocentric action understanding, though it is incremental as it builds on existing datasets by adding perspectives and annotations.
The authors tackled the problem of full-body action understanding by introducing EgoExo-Fitness, a dataset with synchronized egocentric and exocentric fitness videos, rich annotations, and benchmarks for multiple tasks, resulting in new resources for studying action dimensions like 'what', 'when', and 'how well'.
We present EgoExo-Fitness, a new full-body action understanding dataset, featuring fitness sequence videos recorded from synchronized egocentric and fixed exocentric (third-person) cameras. Compared with existing full-body action understanding datasets, EgoExo-Fitness not only contains videos from first-person perspectives, but also provides rich annotations. Specifically, two-level temporal boundaries are provided to localize single action videos along with sub-steps of each action. More importantly, EgoExo-Fitness introduces innovative annotations for interpretable action judgement--including technical keypoint verification, natural language comments on action execution, and action quality scores. Combining all of these, EgoExo-Fitness provides new resources to study egocentric and exocentric full-body action understanding across dimensions of "what", "when", and "how well". To facilitate research on egocentric and exocentric full-body action understanding, we construct benchmarks on a suite of tasks (i.e., action classification, action localization, cross-view sequence verification, cross-view skill determination, and a newly proposed task of guidance-based execution verification), together with detailed analysis. Code and data will be available at https://github.com/iSEE-Laboratory/EgoExo-Fitness/tree/main.