HCCVSep 18, 2025

QuizRank: Picking Images by Quizzing VLMs

arXiv:2509.15059v2h-index: 52
Originality Incremental advance
AI Analysis

This addresses the challenge of image selection for Wikipedia editors, who may lack training, by automating the process to improve article readability and comprehension.

The paper tackles the problem of selecting effective images for Wikipedia articles by proposing QuizRank, a method that uses vision-language models to rank images based on their ability to answer multiple-choice questions about visual characteristics, showing high congruence with human evaluations and effective discriminative ranking.

Images play a vital role in improving the readability and comprehension of Wikipedia articles by serving as `illustrative aids.' However, not all images are equally effective and not all Wikipedia editors are trained in their selection. We propose QuizRank, a novel method of image selection that leverages large language models (LLMs) and vision language models (VLMs) to rank images as learning interventions. Our approach transforms textual descriptions of the article's subject into multiple-choice questions about important visual characteristics of the concept. We utilize these questions to quiz the VLM: the better an image can help answer questions, the higher it is ranked. To further improve discrimination between visually similar items, we introduce a Contrastive QuizRank that leverages differences in the features of target (e.g., a Western Bluebird) and distractor concepts (e.g., Mountain Bluebird) to generate questions. We demonstrate the potential of VLMs as effective visual evaluators by showing a high congruence with human quiz-takers and an effective discriminative ranking of images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes