Editable Scene Simulation for Autonomous Driving via Collaborative LLM-Agents
This addresses the problem of inefficient user interaction and rendering quality in scene simulation for autonomous driving researchers and developers, representing a novel system but with incremental technical components.
The paper tackles limitations in editable scene simulation for autonomous driving by introducing ChatSim, a system that uses natural language commands and external digital assets to generate photo-realistic 3D driving scenes, with experiments on the Waymo Open Dataset showing it can handle complex commands and produce realistic videos.
Scene simulation in autonomous driving has gained significant attention because of its huge potential for generating customized data. However, existing editable scene simulation approaches face limitations in terms of user interaction efficiency, multi-camera photo-realistic rendering and external digital assets integration. To address these challenges, this paper introduces ChatSim, the first system that enables editable photo-realistic 3D driving scene simulations via natural language commands with external digital assets. To enable editing with high command flexibility,~ChatSim leverages a large language model (LLM) agent collaboration framework. To generate photo-realistic outcomes, ChatSim employs a novel multi-camera neural radiance field method. Furthermore, to unleash the potential of extensive high-quality digital assets, ChatSim employs a novel multi-camera lighting estimation method to achieve scene-consistent assets' rendering. Our experiments on Waymo Open Dataset demonstrate that ChatSim can handle complex language commands and generate corresponding photo-realistic scene videos.