ROAILGDec 7, 2019

Driving Style Encoder: Situational Reward Adaptation for General-Purpose Planning in Automated Driving

arXiv:1912.03509v23 citations
Originality Incremental advance
AI Analysis

This work addresses the challenge of reward function specification for automated driving planners, offering a more adaptive solution, though it is incremental as it builds on existing inverse reinforcement learning methods.

The paper tackles the problem of manually designing and tuning linear reward functions for automated driving planning, which is tedious and lacks generalization across situations, by proposing a deep learning approach based on inverse reinforcement learning that generates situation-dependent reward functions, achieving performance on par with linear functions with a-priori knowledge.

General-purpose planning algorithms for automated driving combine mission, behavior, and local motion planning. Such planning algorithms map features of the environment and driving kinematics into complex reward functions. To achieve this, planning experts often rely on linear reward functions. The specification and tuning of these reward functions is a tedious process and requires significant experience. Moreover, a manually designed linear reward function does not generalize across different driving situations. In this work, we propose a deep learning approach based on inverse reinforcement learning that generates situation-dependent reward functions. Our neural network provides a mapping between features and actions of sampled driving policies of a model-predictive control-based planner and predicts reward functions for upcoming planning cycles. In our evaluation, we compare the driving style of reward functions predicted by our deep network against clustered and linear reward functions. Our proposed deep learning approach outperforms clustered linear reward functions and is at par with linear reward functions with a-priori knowledge about the situation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes