CVAIMMJul 8, 2024

Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition

arXiv:2407.05814v18 citationsh-index: 26
Originality Incremental advance
AI Analysis

This addresses the problem of cross-country traffic sign recognition for autonomous driving systems, representing an incremental improvement over existing MLLM approaches.

The paper tackles traffic sign recognition across different countries by proposing a cross-domain few-shot in-context learning method based on multimodal large language models, which reduces dependence on training data and improves performance stability, achieving significant enhancements on benchmark datasets from Germany, Belgium, and Japan.

Recent multimodal large language models (MLLM) such as GPT-4o and GPT-4v have shown great potential in autonomous driving. In this paper, we propose a cross-domain few-shot in-context learning method based on the MLLM for enhancing traffic sign recognition (TSR). We first construct a traffic sign detection network based on Vision Transformer Adapter and an extraction module to extract traffic signs from the original road images. To reduce the dependence on training data and improve the performance stability of cross-country TSR, we introduce a cross-domain few-shot in-context learning method based on the MLLM. To enhance MLLM's fine-grained recognition ability of traffic signs, the proposed method generates corresponding description texts using template traffic signs. These description texts contain key information about the shape, color, and composition of traffic signs, which can stimulate the ability of MLLM to perceive fine-grained traffic sign categories. By using the description texts, our method reduces the cross-domain differences between template and real traffic signs. Our approach requires only simple and uniform textual indications, without the need for large-scale traffic sign images and labels. We perform comprehensive evaluations on the German traffic sign recognition benchmark dataset, the Belgium traffic sign dataset, and two real-world datasets taken from Japan. The experimental results show that our method significantly enhances the TSR performance.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes