CV AIMar 18, 2024

CollagePrompt: A Benchmark for Budget-Friendly Visual Recognition with GPT-4V

Siyu Xu, Yunke Wang, Daochang Liu, Bo Du, Chang Xu

arXiv:2403.11468v211 citationsh-index: 11NAACL

AI Analysis

This addresses the cost barrier for researchers and practitioners using GPT-4V in visual recognition, though it is incremental as it builds on existing prompting techniques.

The paper tackles the high financial cost of using GPT-4V for visual recognition by proposing a collage prompting method that groups multiple images into a single prompt, reducing costs, and finds that recognition accuracy varies with image position and grouping, with incorrect labels often from adjacent images.

Recent advancements in generative AI have suggested that by taking visual prompts, GPT-4V can demonstrate significant proficiency in visual recognition tasks. Despite its impressive capabilities, the financial cost associated with GPT-4V's inference presents a substantial barrier to its wide use. To address this challenge, we propose a budget-friendly collage prompting task that collages multiple images into a single visual prompt and makes GPT-4V perform visual recognition on several images simultaneously, thereby reducing the cost. We collect a dataset of various collage prompts to assess its performance in GPT-4V's visual recognition. Our evaluations reveal several key findings: 1) Recognition accuracy varies with different positions in the collage. 2) Grouping images of the same category together leads to better visual recognition results. 3) Incorrect labels often come from adjacent images. These findings highlight the importance of image arrangement within collage prompt. To this end, we construct a benchmark called CollagePrompt, which offers a platform for designing collage prompt to achieve more cost-effective visual recognition with GPT-4V. A baseline method derived from genetic algorithms to optimize collage layouts is proposed and two metrics are introduced to measure the efficiency of the optimized collage prompt. Our benchmark enables researchers to better optimize collage prompts, thus making GPT-4V more cost-effective in visual recognition. The code and data are available at this project page https://collageprompting.github.io/.

View on arXiv PDF

Similar