AICVOct 21, 2024

Improve Vision Language Model Chain-of-thought Reasoning

CMU
arXiv:2410.16198v1138 citationsh-index: 25ACL
Originality Incremental advance
AI Analysis

This work addresses the need for more interpretable and trustworthy vision language models, but it is incremental as it builds on existing methods like distillation and reinforcement learning.

The paper tackled the problem of poor chain-of-thought reasoning in vision language models due to training on short annotations, by distilling rationales from GPT-4o and applying reinforcement learning with Direct Preference Optimization, resulting in significant improvements in CoT reasoning on benchmark datasets and better generalization to direct answer prediction.

Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving interpretability and trustworthiness. However, current training recipes lack robust CoT reasoning data, relying on datasets dominated by short annotations with minimal rationales. In this work, we show that training VLM on short answers does not generalize well to reasoning tasks that require more detailed responses. To address this, we propose a two-fold approach. First, we distill rationales from GPT-4o model to enrich the training data and fine-tune VLMs, boosting their CoT performance. Second, we apply reinforcement learning to further calibrate reasoning quality. Specifically, we construct positive (correct) and negative (incorrect) pairs of model-generated reasoning chains, by comparing their predictions with annotated short answers. Using this pairwise data, we apply the Direct Preference Optimization algorithm to refine the model's reasoning abilities. Our experiments demonstrate significant improvements in CoT reasoning on benchmark datasets and better generalization to direct answer prediction as well. This work emphasizes the importance of incorporating detailed rationales in training and leveraging reinforcement learning to strengthen the reasoning capabilities of VLMs.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes