CVAICLSep 14, 2025

The System Description of CPS Team for Track on Driving with Language of CVPR 2024 Autonomous Grand Challenge

arXiv:2509.11071v12 citationsh-index: 1Has Code
Originality Synthesis-oriented
AI Analysis

This work addresses autonomous driving challenges by improving language-based reasoning for driving tasks, representing an incremental advancement in a specific competition setting.

The authors tackled the Driving with Language track by developing a vision-language model system based on LLaVA, enhanced with fine-tuning and depth integration, achieving a top score of 0.7799 on the validation set leaderboard.

This report outlines our approach using vision language model systems for the Driving with Language track of the CVPR 2024 Autonomous Grand Challenge. We have exclusively utilized the DriveLM-nuScenes dataset for training our models. Our systems are built on the LLaVA models, which we enhanced through fine-tuning with the LoRA and DoRA methods. Additionally, we have integrated depth information from open-source depth estimation models to enrich the training and inference processes. For inference, particularly with multiple-choice and yes/no questions, we adopted a Chain-of-Thought reasoning approach to improve the accuracy of the results. This comprehensive methodology enabled us to achieve a top score of 0.7799 on the validation set leaderboard, ranking 1st on the leaderboard.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes