Evaluation of Large Language Models for Anomaly Detection in Autonomous Vehicles
This addresses the need for better anomaly detection in autonomous vehicles, but it is incremental as it builds on existing LLM methods applied to a new domain-specific context.
This work tackles the problem of evaluating large language models (LLMs) for anomaly detection in autonomous vehicles by testing them on real-world edge cases where current systems fail, providing qualitative comparison results.
The rapid evolution of large language models (LLMs) has pushed their boundaries to many applications in various domains. Recently, the research community has started to evaluate their potential adoption in autonomous vehicles and especially as complementary modules in the perception and planning software stacks. However, their evaluation is limited in synthetic datasets or manually driving datasets without the ground truth knowledge and more precisely, how the current perception and planning algorithms would perform in the cases under evaluation. For this reason, this work evaluates LLMs on real-world edge cases where current autonomous vehicles have been proven to fail. The proposed architecture consists of an open vocabulary object detector coupled with prompt engineering and large language model contextual reasoning. We evaluate several state-of-the-art models against real edge cases and provide qualitative comparison results along with a discussion on the findings for the potential application of LLMs as anomaly detectors in autonomous vehicles.