RO AI LG MASep 18, 2023

Multi-Agent Deep Reinforcement Learning for Cooperative and Competitive Autonomous Vehicles using AutoDRIVE Ecosystem

Tanmay Vilas Samak, Chinmay Vilas Samak, Venkat Krovi

arXiv:2309.10007v21.91 citationsh-index: 27

Originality Synthesis-oriented

AI Analysis

This work addresses multi-agent coordination problems for autonomous vehicles, presenting incremental improvements by applying existing methods to new scenarios within the AutoDRIVE Ecosystem.

The paper tackled cooperative intersection traversal and competitive autonomous racing using a multi-agent deep reinforcement learning framework, achieving robust training and deployment in stochastic environments with sparse observations and kinodynamic constraints.

This work presents a modular and parallelizable multi-agent deep reinforcement learning framework for imbibing cooperative as well as competitive behaviors within autonomous vehicles. We introduce AutoDRIVE Ecosystem as an enabler to develop physically accurate and graphically realistic digital twins of Nigel and F1TENTH, two scaled autonomous vehicle platforms with unique qualities and capabilities, and leverage this ecosystem to train and deploy multi-agent reinforcement learning policies. We first investigate an intersection traversal problem using a set of cooperative vehicles (Nigel) that share limited state information with each other in single as well as multi-agent learning settings using a common policy approach. We then investigate an adversarial head-to-head autonomous racing problem using a different set of vehicles (F1TENTH) in a multi-agent learning setting using an individual policy approach. In either set of experiments, a decentralized learning architecture was adopted, which allowed robust training and testing of the approaches in stochastic environments, since the agents were mutually independent and exhibited asynchronous motion behavior. The problems were further aggravated by providing the agents with sparse observation spaces and requiring them to sample control commands that implicitly satisfied the imposed kinodynamic as well as safety constraints. The experimental results for both problem statements are reported in terms of quantitative metrics and qualitative remarks for training as well as deployment phases.

View on arXiv PDF

Similar