SignRAG: A Retrieval-Augmented System for Scalable Zero-Shot Road Sign Recognition
This addresses the challenge of recognizing numerous road sign classes without exhaustive labeled datasets for intelligent transportation systems, though it is an incremental application of existing methods to a new domain.
The paper tackles the problem of automated road sign recognition by introducing a zero-shot framework that adapts the Retrieval-Augmented Generation (RAG) paradigm, achieving 95.58% accuracy on ideal images and 82.45% on real-world data.
Automated road sign recognition is a critical task for intelligent transportation systems, but traditional deep learning methods struggle with the sheer number of sign classes and the impracticality of creating exhaustive labeled datasets. This paper introduces a novel zero-shot recognition framework that adapts the Retrieval-Augmented Generation (RAG) paradigm to address this challenge. Our method first uses a Vision Language Model (VLM) to generate a textual description of a sign from an input image. This description is used to retrieve a small set of the most relevant sign candidates from a vector database of reference designs. Subsequently, a Large Language Model (LLM) reasons over the retrieved candidates to make a final, fine-grained recognition. We validate this approach on a comprehensive set of 303 regulatory signs from the Ohio MUTCD. Experimental results demonstrate the framework's effectiveness, achieving 95.58% accuracy on ideal reference images and 82.45% on challenging real-world road data. This work demonstrates the viability of RAG-based architectures for creating scalable and accurate systems for road sign recognition without task-specific training.