HC CVJun 26, 2025

SimVecVis: A Dataset for Enhancing MLLMs in Visualization Understanding

Can Liu, Chunlin Da, Xiaoxiao Long, Yuxiao Yang, Yu Zhang, Yong Wang

arXiv:2506.21319v37.28 citationsh-index: 3Has CodeVIS

Originality Incremental advance

AI Analysis

This addresses a domain-specific bottleneck in visualization understanding for MLLM users, with incremental contributions through a new dataset and format.

The paper tackles the problem of multimodal large language models (MLLMs) struggling with visualization understanding by proposing SimVec, a simplified vector format for encoding chart elements, and building the SimVecVis dataset to enhance MLLM performance, resulting in substantial improvements in data-centric QA tasks for models like MiniCPM.

Current multimodal large language models (MLLMs), while effective in natural image understanding, struggle with visualization understanding due to their inability to decode the data-to-visual mapping and extract structured information. To address these challenges, we propose SimVec, a novel simplified vector format that encodes chart elements such as mark type, position, and size. The effectiveness of SimVec is demonstrated by using MLLMs to reconstruct chart information from SimVec formats. Then, we build a new visualization dataset, SimVecVis, to enhance the performance of MLLMs in visualization understanding, which consists of three key dimensions: bitmap images of charts, their SimVec representations, and corresponding data-centric question-answering (QA) pairs with explanatory chain-of-thought (CoT) descriptions. We finetune state-of-the-art MLLMs (e.g., MiniCPM and Qwen-VL), using SimVecVis with different dataset dimensions. The experimental results show that it leads to substantial performance improvements of MLLMs with good spatial perception capabilities (e.g., MiniCPM) in data-centric QA tasks. Our dataset and source code are available at: https://github.com/VIDA-Lab/SimVecVis.

View on arXiv PDF Code

Similar