Closing the Planning-Learning Loop with Application to Autonomous Driving
This work addresses the critical challenge of real-time planning for autonomous vehicles in complex urban environments, particularly for improving decision-making and safety in dense traffic.
This paper introduces LeTS-Drive, an algorithm that integrates planning and learning in a closed loop to address real-time planning under uncertainty for autonomous driving. It learns a policy and value function from an online planner, which then uses these as heuristics to improve its real-time performance. The system demonstrates superior performance compared to planning or learning in isolation, and open-loop integrations.
Real-time planning under uncertainty is critical for robots operating in complex dynamic environments. Consider, for example, an autonomous robot vehicle driving in dense, unregulated urban traffic of cars, motorcycles, buses, etc. The robot vehicle has to plan in both short and long terms, in order to interact with many traffic participants with uncertain intentions and drive effectively. Planning explicitly over a long time horizon, however, incurs prohibitive computational costs and is impractical under real-time constraints. To achieve real-time performance for large-scale planning, this work introduces a new algorithm Learning from Tree Search for Driving (LeTS-Drive), which integrates planning and learning in a closed loop, and applies it to autonomous driving in crowded urban traffic in simulation. Specifically, LeTS-Drive learns a policy and its value function from data provided by an online planner, which searches a sparsely-sampled belief tree; the online planner in turn uses the learned policy and value functions as heuristics to scale up its run-time performance for real-time robot control. These two steps are repeated to form a closed loop so that the planner and the learner inform each other and improve in synchrony. The algorithm learns on its own in a self-supervised manner, without human effort on explicit data labeling. Experimental results demonstrate that LeTS-Drive outperforms either planning or learning alone, as well as open-loop integration of planning and learning.