CLDec 20, 2022

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, Ashish Sabharwal

arXiv:2212.10509v236.51163 citationsh-index: 56Has Code

Originality Incremental advance

AI Analysis

It solves the issue of inaccurate or outdated knowledge in LLMs for complex QA tasks, though it is incremental as it builds on existing retrieval and CoT methods.

The paper tackles the problem of knowledge-intensive multi-step question answering by addressing the limitations of one-step retrieval in large language models, proposing IRCoT which interleaves retrieval with chain-of-thought reasoning. This approach improves retrieval by up to 21 points and QA performance by up to 15 points across four datasets, including out-of-distribution settings and smaller models.

Prompting-based large language models (LLMs) are surprisingly powerful at generating natural language reasoning steps or Chains-of-Thoughts (CoT) for multi-step question answering (QA). They struggle, however, when the necessary knowledge is either unavailable to the LLM or not up-to-date within its parameters. While using the question to retrieve relevant text from an external knowledge source helps LLMs, we observe that this one-step retrieve-and-read approach is insufficient for multi-step QA. Here, \textit{what to retrieve} depends on \textit{what has already been derived}, which in turn may depend on \textit{what was previously retrieved}. To address this, we propose IRCoT, a new approach for multi-step QA that interleaves retrieval with steps (sentences) in a CoT, guiding the retrieval with CoT and in turn using retrieved results to improve CoT. Using IRCoT with GPT3 substantially improves retrieval (up to 21 points) as well as downstream QA (up to 15 points) on four datasets: HotpotQA, 2WikiMultihopQA, MuSiQue, and IIRC. We observe similar substantial gains in out-of-distribution (OOD) settings as well as with much smaller models such as Flan-T5-large without additional training. IRCoT reduces model hallucination, resulting in factually more accurate CoT reasoning. Code, data, and prompts are available at \url{https://github.com/stonybrooknlp/ircot}

View on arXiv PDF Code

Similar