CL AIMar 30, 2025

An Analysis of Decoding Methods for LLM-based Agents for Faithful Multi-Hop Question Answering

Alexander Murphy, Mohd Sanad Zaki Rizvi, Aden Haussmann, Ping Nie, Guifu Liu, Aryo Pradipta Gema, Pasquale Minervini

arXiv:2503.23415v16.72 citationsh-index: 11

Originality Synthesis-oriented

AI Analysis

This addresses the issue of factual inaccuracies in LLM outputs for knowledge-intensive NLP tasks, but it is incremental as it builds on existing frameworks and methods.

The paper tackles the problem of LLM hallucination in multi-hop question answering by analyzing how combining the ReAct agentic framework with decoding methods like DoLa improves faithfulness, resulting in an F1 increase from 19.5 to 32.6 on HotpotQA.

Large Language Models (LLMs) frequently produce factually inaccurate outputs - a phenomenon known as hallucination - which limits their accuracy in knowledge-intensive NLP tasks. Retrieval-augmented generation and agentic frameworks such as Reasoning and Acting (ReAct) can address this issue by giving the model access to external knowledge. However, LLMs often fail to remain faithful to retrieved information. Mitigating this is critical, especially if LLMs are required to reason about the retrieved information. Recent research has explored training-free decoding strategies to improve the faithfulness of model generations. We present a systematic analysis of how the combination of the ReAct framework and decoding strategies (i.e., DeCoRe, DoLa, and CAD) can influence the faithfulness of LLM-generated answers. Our results show that combining an agentic framework for knowledge retrieval with decoding methods that enhance faithfulness can increase accuracy on the downstream Multi-Hop Question Answering tasks. For example, we observe an F1 increase from 19.5 to 32.6 on HotpotQA when using ReAct and DoLa.

View on arXiv PDF

Similar