RO AIJul 22, 2024

Importance Sampling-Guided Meta-Training for Intelligent Agents in Highly Interactive Environments

Mansur Arief, Mike Timmerman, Jiachen Li, David Isele, Mykel J Kochenderfer

arXiv:2407.15839v28.34 citationsh-index: 23

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving agent robustness in common driving scenarios for autonomous systems, though it is incremental as it builds on existing guided meta RL methods.

The study tackled the problem of training intelligent agents for highly interactive driving environments by introducing a framework that integrates guided meta reinforcement learning with importance sampling, resulting in accelerated training and performance improvements in tasks like T-intersections and roundabouts.

Training intelligent agents to navigate highly interactive environments presents significant challenges. While guided meta reinforcement learning (RL) approach that first trains a guiding policy to train the ego agent has proven effective in improving generalizability across scenarios with various levels of interaction, the state-of-the-art method tends to be overly sensitive to extreme cases, impairing the agents' performance in the more common scenarios. This study introduces a novel training framework that integrates guided meta RL with importance sampling (IS) to optimize training distributions iteratively for navigating highly interactive driving scenarios, such as T-intersections or roundabouts. Unlike traditional methods that may underrepresent critical interactions or overemphasize extreme cases during training, our approach strategically adjusts the training distribution towards more challenging driving behaviors using IS proposal distributions and applies the importance ratio to de-bias the result. By estimating a naturalistic distribution from real-world datasets and employing a mixture model for iterative training refinements, the framework ensures a balanced focus across common and extreme driving scenarios. Experiments conducted with both synthetic and naturalistic datasets demonstrate both accelerated training and performance improvements under highly interactive driving tasks.

View on arXiv PDF

Similar