CL AI CVDec 13, 2023

Assessing GPT4-V on Structured Reasoning Tasks

Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Gust Verbruggen

Microsoft

arXiv:2312.11524v14.319 citationsh-index: 63

Originality Synthesis-oriented

AI Analysis

This work assesses the capabilities of a new multimodal AI model for researchers and practitioners, but it is incremental as it focuses on benchmarking and prompting techniques.

The study evaluated GPT-4V and other models on structured reasoning tasks like math and code generation, finding that visual Chain-of-Thought prompting significantly improved performance over the vanilla model.

Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chain-of-Thought to multi-modal LLMs, yields significant improvements over the vanilla model. We also present a categorized analysis of scenarios where these models perform well and where they struggle, highlighting challenges associated with coherent multimodal reasoning.

View on arXiv PDF

Similar