LG RODec 6, 2020

Amortized Q-learning with Model-based Action Proposals for Autonomous Driving on Highways

Branka Mirchevska, Maria Hügle, Gabriel Kalweit, Moritz Werling, Joschka Boedecker

arXiv:2012.03234v12.38 citationsHas Code

Originality Incremental advance

AI Analysis

This work aims to improve long-term driving strategies for autonomous vehicles on highways by combining the strengths of RL and classical trajectory planning, offering an incremental improvement for the autonomous driving domain.

The paper addresses the challenge of long-term optimal decision-making in autonomous highway driving by integrating a Reinforcement Learning approach with a trajectory planner. This method learns an optimal long-term strategy while leveraging the benefits of classical short-term trajectory planning, achieving superior performance against four benchmark approaches in the SUMO simulator.

Well-established optimization-based methods can guarantee an optimal trajectory for a short optimization horizon, typically no longer than a few seconds. As a result, choosing the optimal trajectory for this short horizon may still result in a sub-optimal long-term solution. At the same time, the resulting short-term trajectories allow for effective, comfortable and provable safe maneuvers in a dynamic traffic environment. In this work, we address the question of how to ensure an optimal long-term driving strategy, while keeping the benefits of classical trajectory planning. We introduce a Reinforcement Learning based approach that coupled with a trajectory planner, learns an optimal long-term decision-making strategy for driving on highways. By online generating locally optimal maneuvers as actions, we balance between the infinite low-level continuous action space, and the limited flexibility of a fixed number of predefined standard lane-change actions. We evaluated our method on realistic scenarios in the open-source traffic simulator SUMO and were able to achieve better performance than the 4 benchmark approaches we compared against, including a random action selecting agent, greedy agent, high-level, discrete actions agent and an IDM-based SUMO-controlled agent.

View on arXiv PDF

Similar