LGJan 9, 2025

DriVLM: Domain Adaptation of Vision-Language Models in Autonomous Driving

arXiv:2501.05081v14.11 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses computational barriers for researchers and developers in autonomous driving, but it appears incremental as it focuses on applying existing small-scale models to a new domain.

The paper tackled the challenge of high computational resources for multimodal large language models (MLLMs) by exploring small-scale MLLMs and applying them to autonomous driving, aiming to advance real-world applications.

In recent years, large language models have had a very impressive performance, which largely contributed to the development and application of artificial intelligence, and the parameters and performance of the models are still growing rapidly. In particular, multimodal large language models (MLLM) can combine multiple modalities such as pictures, videos, sounds, texts, etc., and have great potential in various tasks. However, most MLLMs require very high computational resources, which is a major challenge for most researchers and developers. In this paper, we explored the utility of small-scale MLLMs and applied small-scale MLLMs to the field of autonomous driving. We hope that this will advance the application of MLLMs in real-world scenarios.

View on arXiv PDF

Similar