CVAISep 2, 2024

An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving

arXiv:2409.10554v1h-index: 2
Originality Incremental advance
AI Analysis

This addresses performance issues in autonomous driving for researchers and practitioners, though it is incremental as it builds on existing methods for representation learning.

The research tackles the challenge of partial observability in vision-based deep reinforcement learning for autonomous driving by using an offline-trained encoder from self-supervised learning on large video datasets, achieving zero-shot transfer to tasks like lane following and collision avoidance in the CARLA simulator.

Our research investigates the challenges Deep Reinforcement Learning (DRL) faces in complex, Partially Observable Markov Decision Processes (POMDP) such as autonomous driving (AD), and proposes a solution for vision-based navigation in these environments. Partial observability reduces RL performance significantly, and this can be mitigated by augmenting sensor information and data fusion to reflect a more Markovian environment. However, this necessitates an increasingly complex perception module, whose training via RL is complicated due to inherent limitations. As the neural network architecture becomes more complex, the reward function's effectiveness as an error signal diminishes since the only source of supervision is the reward, which is often noisy, sparse, and delayed. Task-irrelevant elements in images, such as the sky or certain objects, pose additional complexities. Our research adopts an offline-trained encoder to leverage large video datasets through self-supervised learning to learn generalizable representations. Then, we train a head network on top of these representations through DRL to learn to control an ego vehicle in the CARLA AD simulator. This study presents a broad investigation of the impact of different learning schemes for offline-training of encoders on the performance of DRL agents in challenging AD tasks. Furthermore, we show that the features learned by watching BDD100K driving videos can be directly transferred to achieve lane following and collision avoidance in CARLA simulator, in a zero-shot learning fashion. Finally, we explore the impact of various architectural decisions for the RL networks to utilize the transferred representations efficiently. Therefore, in this work, we introduce and validate an optimal way for obtaining suitable representations of the environment, and transferring them to RL networks.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes