Learning to Ball: Composing Policies for Long-Horizon Basketball Moves
This addresses a challenge in reinforcement learning for robotics or simulation domains where tasks involve distinct phases with ill-defined intermediate states, though it appears incremental as it builds on existing policy composition methods.
The paper tackles the problem of learning control policies for multi-phase, long-horizon tasks like basketball maneuvers by introducing a policy integration framework and high-level soft router to enable seamless transitions between subtasks, resulting in effective control of a simulated character to accomplish tasks without relying on ball trajectory references.
Learning a control policy for a multi-phase, long-horizon task, such as basketball maneuvers, remains challenging for reinforcement learning approaches due to the need for seamless policy composition and transitions between skills. A long-horizon task typically consists of distinct subtasks with well-defined goals, separated by transitional subtasks with unclear goals but critical to the success of the entire task. Existing methods like the mixture of experts and skill chaining struggle with tasks where individual policies do not share significant commonly explored states or lack well-defined initial and terminal states between different phases. In this paper, we introduce a novel policy integration framework to enable the composition of drastically different motor skills in multi-phase long-horizon tasks with ill-defined intermediate states. Based on that, we further introduce a high-level soft router to enable seamless and robust transitions between the subtasks. We evaluate our framework on a set of fundamental basketball skills and challenging transitions. Policies trained by our approach can effectively control the simulated character to interact with the ball and accomplish the long-horizon task specified by real-time user commands, without relying on ball trajectory references.