CV AIJun 14, 2024

First Multi-Dimensional Evaluation of Flowchart Comprehension for Multimodal Large Language Models

Enming Zhang, Ruobing Yao, Huanyong Liu, Junhui Yu, Jiale Wang

arXiv:2406.10057v35.22 citationsHas Code

Originality Synthesis-oriented

AI Analysis

This addresses the problem of evaluating MLLMs for flowchart comprehension, which is important in daily life and work, but it is incremental as it introduces a new benchmark rather than a novel method.

The authors tackled the lack of a comprehensive evaluation method for Multimodal Large Language Models (MLLMs) in flowchart-related tasks by proposing FlowCE, the first method to assess MLLMs across multiple dimensions such as reasoning and localization, and found that even top models like GPT4o scored only 56.63, with open-source models like Phi-3-Vision reaching 49.97.

With the development of Multimodal Large Language Models (MLLMs) technology, its general capabilities are increasingly powerful. To evaluate the various abilities of MLLMs, numerous evaluation systems have emerged. But now there is still a lack of a comprehensive method to evaluate MLLMs in the tasks related to flowcharts, which are very important in daily life and work. We propose the first comprehensive method, FlowCE, to assess MLLMs across various dimensions for tasks related to flowcharts. It encompasses evaluating MLLMs' abilities in Reasoning, Localization Recognition, Information Extraction, Logical Verification, and Summarization on flowcharts. However, we find that even the GPT4o model achieves only a score of 56.63. Among open-source models, Phi-3-Vision obtained the highest score of 49.97. We hope that FlowCE can contribute to future research on MLLMs for tasks based on flowcharts. \url{https://github.com/360AILABNLP/FlowCE}

View on arXiv PDF Code

Similar