Nudging Hidden States: Training-Free Model Steering for Chain-of-Thought Reasoning in Large Audio-Language Models
This work addresses the challenge of enhancing reasoning in audio-language models for applications like speech-based AI, but it is incremental as it builds on existing CoT and steering methods.
The paper tackled the problem of improving chain-of-thought reasoning in large audio-language models without training, by introducing training-free model steering strategies, resulting in accuracy gains up to 4.4% over standard CoT prompting.
Chain-of-thought (CoT) prompting has been extended to large audio-language models (LALMs) to elicit reasoning, yet enhancing its effectiveness without training remains challenging. We study inference-time model steering as a training-free approach to improve LALM reasoning. We introduce three strategies using diverse information sources and evaluate them across four LALMs and four benchmarks. Results show general accuracy gains up to 4.4% over CoT prompting. Notably, we identify a cross-modal transfer where steering vectors derived from few text samples effectively guide speech-based reasoning, demonstrating high data efficiency. We also examine hyperparameter sensitivity to understand the robustness of these approaches. Our findings position model steering as a practical direction for strengthening LALM reasoning.