CV CLApr 7, 2025

Seeking and Updating with Live Visual Knowledge

Mingyang Fu, Yuyang Peng, Dongping Chen, Zetong Zhou, Benlin Liu, Yao Wan, Zhou Zhao, Philip S. Yu, Ranjay Krishna

arXiv:2504.05288v221.716 citationsh-index: 14

Originality Incremental advance

AI Analysis

This addresses the stagnation issue in MLLMs for applications requiring up-to-date visual knowledge, such as news analysis or augmented reality, though it is incremental as it focuses on dataset creation and benchmarking.

The paper tackles the problem of Multimodal Large Language Models (MLLMs) struggling to stay current with evolving visual information due to fixed training datasets, by introducing LiveVQA, a dataset of 107,143 samples from April 2024-May 2025, and benchmarking 17 MLLMs shows tool-use or agentic frameworks improve performance by an average of 327% on content beyond knowledge cutoffs.

The visual world around us constantly evolves, from real-time news and social media trends to global infrastructure changes visible through satellite imagery and augmented reality enhancements. However, Multimodal Large Language Models (MLLMs), which automate many tasks, struggle to stay current, limited by the cutoff dates in their fixed training datasets. To quantify this stagnation, we introduce LiveVQA, the first-of-its-kind dataset featuring 107,143 samples and 12 categories data specifically designed to support research in both seeking and updating with live visual knowledge. Drawing from recent news articles, video platforms, and academic publications in April 2024-May 2025, LiveVQA enables evaluation of how models handle latest visual information beyond their knowledge boundaries and how current methods help to update them. Our comprehensive benchmarking of 17 state-of-the-art MLLMs reveals significant performance gaps on content beyond knowledge cutoff, and tool-use or agentic visual seeking framework drastically gain an average of 327% improvement. Furthermore, we explore parameter-efficient fine-tuning (PEFT) methods to update MLLMs with new visual knowledge. We dive deeply to the critical balance between adapter capacity and model capability when updating MLLMs with new visual knowledge. All the experimental dataset and source code are publicly available at: https://livevqa.github.io.

View on arXiv PDF

Similar