CVAug 17, 2025

LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language Models

arXiv:2508.12512v11 citationsh-index: 19Has CodeICIP
Originality Incremental advance
AI Analysis

This addresses efficiency and flexibility issues in fine-tuning large VLMs for multimodal tasks, representing an incremental improvement over standard LoRA methods.

The paper tackles the problem of fixed-rank LoRA fine-tuning in Vision Language Models by introducing LangVision-LoRA-NAS, which uses Neural Architecture Search to dynamically optimize LoRA ranks for specific tasks, resulting in improved performance and reduced fine-tuning costs as demonstrated on LLaMA-3.2-11B models across datasets.

Vision Language Models (VLMs) integrate visual and text modalities to enable multimodal understanding and generation. These models typically combine a Vision Transformer (ViT) as an image encoder and a Large Language Model (LLM) for text generation. LoRA (Low-Rank Adaptation) is an efficient fine-tuning method to adapt pre-trained models to new tasks by introducing low-rank updates to their weights. While LoRA has emerged as a powerful technique for fine-tuning large models by introducing low-rank updates, current implementations assume a fixed rank, potentially limiting flexibility and efficiency across diverse tasks. This paper introduces \textit{LangVision-LoRA-NAS}, a novel framework that integrates Neural Architecture Search (NAS) with LoRA to optimize VLMs for variable-rank adaptation. Our approach leverages NAS to dynamically search for the optimal LoRA rank configuration tailored to specific multimodal tasks, balancing performance and computational efficiency. Through extensive experiments using the LLaMA-3.2-11B model on several datasets, LangVision-LoRA-NAS demonstrates notable improvement in model performance while reducing fine-tuning costs. Our Base and searched fine-tuned models on LLaMA-3.2-11B-Vision-Instruct can be found \href{https://huggingface.co/collections/krishnateja95/llama-32-11b-vision-instruct-langvision-lora-nas-6786cac480357a6a6fcc59ee}{\textcolor{blue}{here}} and the code for LangVision-LoRA-NAS can be found \href{https://github.com/krishnateja95/LangVision-NAS}{\textcolor{blue}{here}}.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes