Multi-Agent Vulnerability Discovery for Autonomous Driving with Hazard Arbitration Reward
This work addresses the costly and responsibility-attribution challenges in testing autonomous driving systems, offering a method to improve safety through targeted vulnerability discovery.
The paper tackles the problem of efficiently discovering hazardous scenarios for autonomous driving policies by proposing STARS, a multi-agent reinforcement learning framework with Hazard Arbitration Reward, which effectively identifies autonomous-vehicle responsible vulnerabilities in four driving policies across three environments.
Discovering hazardous scenarios is crucial in testing and further improving driving policies. However, conducting efficient driving policy testing faces two key challenges. On the one hand, the probability of naturally encountering hazardous scenarios is low when testing a well-trained autonomous driving strategy. Thus, discovering these scenarios by purely real-world road testing is extremely costly. On the other hand, a proper determination of accident responsibility is necessary for this task. Collecting scenarios with wrong-attributed responsibilities will lead to an overly conservative autonomous driving strategy. To be more specific, we aim to discover hazardous scenarios that are autonomous-vehicle responsible (AV-responsible), i.e., the vulnerabilities of the under-test driving policy. To this end, this work proposes a Safety Test framework by finding Av-Responsible Scenarios (STARS) based on multi-agent reinforcement learning. STARS guides other traffic participants to produce Av-Responsible Scenarios and make the under-test driving policy misbehave via introducing Hazard Arbitration Reward (HAR). HAR enables our framework to discover diverse, complex, and AV-responsible hazardous scenarios. Experimental results against four different driving policies in three environments demonstrate that STARS can effectively discover AV-responsible hazardous scenarios. These scenarios indeed correspond to the vulnerabilities of the under-test driving policies, thus are meaningful for their further improvements.