CLFeb 25, 2025

Detecting Knowledge Boundary of Vision Large Language Models by Sampling-Based Inference

arXiv:2502.18023v38 citationsh-index: 29Has CodeEMNLP
Originality Incremental advance
AI Analysis

This work addresses the challenge of optimizing retrieval-augmented generation for VLLMs, offering a more efficient approach for handling knowledge-intensive queries, though it is incremental as it builds on existing RAG techniques.

The paper tackles the problem of inefficient retrieval in Vision Large Language Models (VLLMs) by proposing a method to detect their knowledge boundaries, enabling reduced retrieval while maintaining or improving performance on Visual Question Answering datasets.

Despite the advancements made in Vision Large Language Models (VLLMs), like text Large Language Models (LLMs), they have limitations in addressing questions that require real-time information or are knowledge-intensive. Indiscriminately adopting Retrieval Augmented Generation (RAG) techniques is an effective yet expensive way to enable models to answer queries beyond their knowledge scopes. To mitigate the dependence on retrieval and simultaneously maintain, or even improve, the performance benefits provided by retrieval, we propose a method to detect the knowledge boundary of VLLMs, allowing for more efficient use of techniques like RAG. Specifically, we propose a method with two variants that fine-tune a VLLM on an automatically constructed dataset for boundary identification. Experimental results on various types of Visual Question Answering datasets show that our method successfully depicts a VLLM's knowledge boundary, based on which we are able to reduce indiscriminate retrieval while maintaining or improving the performance. In addition, we show that the knowledge boundary identified by our method for one VLLM can be used as a surrogate boundary for other VLLMs. Code will be released at https://github.com/Chord-Chen-30/VLLM-KnowledgeBoundary

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes