Building Safer Autonomous Agents by Leveraging Risky Driving Behavior Knowledge
This work addresses the challenge of creating safer autonomous agents for real-world driving by incorporating risky behavior knowledge, though it is incremental as it builds on existing simulation and reinforcement learning methods.
The study tackled the problem of training autonomous driving agents in simulation environments that lack realistic risky driving behaviors by systematically generating risk-prone scenarios with heavy traffic and unexpected random behavior. The result was improved performance of model-free learning agents compared to baseline agents, with a measured performance improvement from incorporating these scenarios.
Simulation environments are good for learning different driving tasks like lane changing, parking or handling intersections etc. in an abstract manner. However, these simulation environments often restrict themselves to operate under conservative interaction behavior amongst different vehicles. But, as we know, real driving tasks often involve very high risk scenarios where other drivers often don't behave in the expected sense. There can be many reasons for this behavior like being tired or inexperienced. The simulation environment doesn't take this information into account while training the navigation agent. Therefore, in this study we especially focus on systematically creating these risk prone scenarios with heavy traffic and unexpected random behavior for creating better model-free learning agents. We generate multiple autonomous driving scenarios by creating new custom Markov Decision Process (MDP) environment iterations in the highway-env simulation package. The behavior policy is learnt by agents trained with the help from deep reinforcement learning models. Our behavior policy is deliberated to handle collisions and risky randomized driver behavior. We train model free learning agents with supplement information of risk prone driving scenarios and compare their performance with baseline agents. Finally, we casually measure the impact of adding these perturbations in the training process to precisely account for the performance improvement obtained from utilizing the learnings from these scenarios.