Explanation for Trajectory Planning using Multi-modal Large Language Model for Autonomous Driving
This addresses passenger anxiety in autonomous vehicles by improving explanation quality, though it is incremental as it builds on existing captioning approaches with a new dataset.
The paper tackles the lack of interpretability in end-to-end autonomous driving models by proposing a reasoning model that generates captions describing future vehicle behaviors and their reasons, using future planning trajectories as inputs instead of momentary control signals to better reflect plans.
End-to-end style autonomous driving models have been developed recently. These models lack interpretability of decision-making process from perception to control of the ego vehicle, resulting in anxiety for passengers. To alleviate it, it is effective to build a model which outputs captions describing future behaviors of the ego vehicle and their reason. However, the existing approaches generate reasoning text that inadequately reflects the future plans of the ego vehicle, because they train models to output captions using momentary control signals as inputs. In this study, we propose a reasoning model that takes future planning trajectories of the ego vehicle as inputs to solve this limitation with the dataset newly collected.