CyboRacket: A Perception-to-Action Framework for Humanoid Racket Sports
This work addresses the problem of enabling humanoid robots to perform racket sports autonomously with onboard sensing, which is incremental as it builds on existing methods but integrates them into a novel framework for this specific domain.
The paper tackles the challenge of dynamic ball-interaction tasks for humanoid robots in racket sports by developing CyboRacket, a hierarchical perception-to-action framework that integrates onboard visual perception, physics-based trajectory prediction, and pre-trained whole-body control, resulting in successful real-time striking using purely onboard sensing on the Unitree G1 robot.
Dynamic ball-interaction tasks remain challenging for robots because they require tight perception-action coupling under limited reaction time. This challenge is especially pronounced in humanoid racket sports, where successful interception depends on accurate visual tracking, trajectory prediction, coordinated stepping, and stable whole-body striking. Existing robotic racket-sport systems often rely on external motion capture for state estimation or on task-specific low-level controllers that must be retrained across tasks and platforms. We present CyboRacket, a hierarchical perception-to-action framework for humanoid racket sports that integrates onboard visual perception, physics-based trajectory prediction, and large-scale pre-trained whole-body control. The framework uses onboard cameras to track the incoming object, predicts its future trajectory, and converts the estimated interception state into target end-effector and base-motion commands for whole-body execution by SONIC on the Unitree G1 humanoid robot. We evaluate the proposed framework in a vision-based humanoid tennis-hitting task. Experimental results demonstrate real-time visual tracking, trajectory prediction, and successful striking using purely onboard sensing.