AIJun 6, 2024

Optimizing Autonomous Driving for Safety: A Human-Centric Approach with LLM-Enhanced RLHF

arXiv:2406.04481v149 citations
Originality Synthesis-oriented
AI Analysis

This work addresses safety optimization in autonomous driving, but appears incremental as it builds on existing RLHF and LLM methods without claiming major breakthroughs.

The paper tackles the problem of enhancing autonomous driving safety by combining Reinforcement Learning from Human Feedback (RLHF) and Large Language Models (LLMs) in a multi-agent simulation with human-controlled agents, and plans to validate the model using real-life testbed data from New Jersey and New York City.

Reinforcement Learning from Human Feedback (RLHF) is popular in large language models (LLMs), whereas traditional Reinforcement Learning (RL) often falls short. Current autonomous driving methods typically utilize either human feedback in machine learning, including RL, or LLMs. Most feedback guides the car agent's learning process (e.g., controlling the car). RLHF is usually applied in the fine-tuning step, requiring direct human "preferences," which are not commonly used in optimizing autonomous driving models. In this research, we innovatively combine RLHF and LLMs to enhance autonomous driving safety. Training a model with human guidance from scratch is inefficient. Our framework starts with a pre-trained autonomous car agent model and implements multiple human-controlled agents, such as cars and pedestrians, to simulate real-life road environments. The autonomous car model is not directly controlled by humans. We integrate both physical and physiological feedback to fine-tune the model, optimizing this process using LLMs. This multi-agent interactive environment ensures safe, realistic interactions before real-world application. Finally, we will validate our model using data gathered from real-life testbeds located in New Jersey and New York City.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes