SE AIApr 2, 2025

On Simulation-Guided LLM-based Code Generation for Safe Autonomous Driving Software

Ali Nouri, Johan Andersson, Kailash De Jesus Hornig, Zhennan Fei, Emil Knabe, Hakan Sivencrona, Beatriz Cabrero-Daniel, Christian Berger

arXiv:2504.02141v18.06 citationsh-index: 6EASE

Originality Synthesis-oriented

AI Analysis

This addresses the time- and resource-intensive DevOps processes in autonomous driving software development, though it appears incremental as it builds on existing LLM and simulation methods.

The study tackled the challenge of automating code generation for safety-critical autonomous driving software by developing a prototype that uses LLMs to generate code, simulates it in traffic scenarios, and provides feedback for improvements, achieving evaluation with multiple LLMs and expert assessments at OEMs.

Automated Driving System (ADS) is a safety-critical software system responsible for the interpretation of the vehicle's environment and making decisions accordingly. The unbounded complexity of the driving context, including unforeseeable events, necessitate continuous improvement, often achieved through iterative DevOps processes. However, DevOps processes are themselves complex, making these improvements both time- and resource-intensive. Automation in code generation for ADS using Large Language Models (LLM) is one potential approach to address this challenge. Nevertheless, the development of ADS requires rigorous processes to verify, validate, assess, and qualify the code before it can be deployed in the vehicle and used. In this study, we developed and evaluated a prototype for automatic code generation and assessment using a designed pipeline of a LLM-based agent, simulation model, and rule-based feedback generator in an industrial setup. The LLM-generated code is evaluated automatically in a simulation model against multiple critical traffic scenarios, and an assessment report is provided as feedback to the LLM for modification or bug fixing. We report about the experimental results of the prototype employing Codellama:34b, DeepSeek (r1:32b and Coder:33b), CodeGemma:7b, Mistral:7b, and GPT4 for Adaptive Cruise Control (ACC) and Unsupervised Collision Avoidance by Evasive Manoeuvre (CAEM). We finally assessed the tool with 11 experts at two Original Equipment Manufacturers (OEMs) by conducting an interview study.

View on arXiv PDF

Similar