CLAIDec 22, 2024

Rationale-guided Prompting for Knowledge-based Visual Question Answering

arXiv:2412.16936v371 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses the challenge of insufficiently activating LLM capacities in VQA for researchers and practitioners, though it is incremental as it builds on existing Chain of Thought prompting methods.

The paper tackles the problem of knowledge-based Visual Question Answering (VQA) by proposing a framework that prompts Large Language Models (LLMs) with rationale heuristics to generate intermediate thought processes before predicting answers, resulting in performance improvements of over 2.2 and 2.1 on OK-VQA and A-OKVQA datasets.

Recently, Large Language Models (LLMs) have been used for knowledge-based Visual Question Answering (VQA). Despite the encouraging results of previous studies, prior methods prompt LLMs to predict answers directly, neglecting intermediate thought processes. We argue that prior methods do not sufficiently activate the capacities of LLMs. We propose a framework called PLRH that Prompts LLMs with Rationale Heuristics for knowledge-based VQA. The PLRH prompts LLMs with Chain of Thought (CoT) to generate rationale heuristics, i.e., intermediate thought processes, and then leverages the rationale heuristics to inspire LLMs to predict answers. Experiments show that our approach outperforms the existing baselines by more than 2.2 and 2.1 on OK-VQA and A-OKVQA, respectively.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes