CVJan 30

One-shot Optimized Steering Vector for Hallucination Mitigation for VLMs

arXiv:2601.23041v11 citationsh-index: 9
Originality Incremental advance
AI Analysis

This addresses reliability issues in VLMs for multimodal applications, offering a practical and scalable solution, though it is incremental as it builds on existing steering techniques.

The paper tackles the problem of hallucination and safety failures in Vision Language Models (VLMs) by proposing OSGA, a one-shot input-independent steering framework that improves performance with a single optimization instance, achieving consistent enhancements across multiple benchmarks with negligible overhead.

Vision Language Models (VLMs) achieve strong performance on multimodal tasks but still suffer from hallucination and safety-related failures that persist even at scale. Steering offers a lightweight technique to improve model performance. However, steering, whether input-dependent or input-independent, achieves a meaningful trade-off between efficiency and effectiveness. In this work, we observe that steering vectors can generalize across inputs when tasks share aligned semantic intent. Based on this insight, we propose \textbf{OSGA} (\textbf{O}ne-shot \textbf{S}teering with \textbf{G}enerative \textbf{A}nchor), an input-independent framework that improves model performance with a single optimization instance. OSGA first selects an informative sample via a variance-based data selection strategy and learns a single steering vector with a contrastive objective with generative anchor regularization. The resulting vector can be universally applied at a certain layer during inference time without modifying model parameters. Experiments across multiple benchmarks show that a single OSGA-optimized steering vector consistently improves hallucination mitigation and safety enhancement with negligible overhead, highlighting one-shot steering as a practical and scalable solution for reliable VLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes