DynaHOI: Benchmarking Hand-Object Interaction for Dynamic Target
This addresses the problem of evaluating hand motion generation for dynamic scenarios in robotics and computer vision, though it is incremental as it builds on existing static benchmarks.
The paper tackles the lack of benchmarks for hand-object interaction with moving targets by introducing DynaHOI-Gym, a platform for dynamic capture evaluation, and DynaHOI-10M, a large-scale benchmark with 10M frames and 180K trajectories. Their baseline method ObAct improves location success rate by 8.1%.
Most existing hand motion generation benchmarks for hand-object interaction (HOI) focus on static objects, leaving dynamic scenarios with moving targets and time-critical coordination largely untested. To address this gap, we introduce the DynaHOI-Gym, a unified online closed-loop platform with parameterized motion generators and rollout-based metrics for dynamic capture evaluation. Built on DynaHOI-Gym, we release DynaHOI-10M, a large-scale benchmark with 10M frames and 180K hand capture trajectories, whose target motions are organized into 8 major categories and 22 fine-grained subcategories. We also provide a simple observe-before-act baseline (ObAct) that integrates short-term observations with the current frame via spatiotemporal attention to predict actions, achieving an 8.1% improvement in location success rate.