CVJun 1, 2025

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

Xiaorong Zhu, Ziheng Jia, Jiarui Wang, Xiangyu Zhao, Haodong Duan, Xiongkuo Min, Jia Wang, Zicheng Zhang, Guangtao Zhai

arXiv:2506.00991v23 citationsh-index: 49Has CodeMM

Originality Synthesis-oriented

AI Analysis

This work addresses a gap in evaluating MLLMs for fine-grained physical principles, which is important for researchers and developers in AI and computer vision, though it is incremental as it focuses on a specific domain.

The paper tackles the problem of assessing Multi-modality Large Language Models' (MLLMs) capabilities in geometric optics by introducing GOBench, a benchmark for evaluating generation and understanding tasks, revealing that current models struggle significantly, with the best model achieving only 37.35% accuracy in understanding.

The rapid evolution of Multi-modality Large Language Models (MLLMs) is driving significant advancements in visual understanding and generation. Nevertheless, a comprehensive assessment of their capabilities, concerning the fine-grained physical principles especially in geometric optics, remains underexplored. To address this gap, we introduce GOBench, the first benchmark to systematically evaluate MLLMs' ability across two tasks: 1) Generating Optically Authentic Imagery and 2) Understanding Underlying Optical Phenomena. We curates high-quality prompts of geometric optical scenarios and use MLLMs to construct GOBench-Gen-1k dataset.We then organize subjective experiments to assess the generated imagery based on Optical Authenticity, Aesthetic Quality, and Instruction Fidelity, revealing MLLMs' generation flaws that violate optical principles. For the understanding task, we apply crafted evaluation instructions to test optical understanding ability of eleven prominent MLLMs. The experimental results demonstrate that current models face significant challenges in both optical generation and understanding. The top-performing generative model, GPT-4o-Image, cannot perfectly complete all generation tasks, and the best-performing MLLM model, Gemini-2.5Pro, attains a mere 37.35\% accuracy in optical understanding. Database and codes are publicly available at https://github.com/aiben-ch/GOBench.

View on arXiv PDF Code

Similar