Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning
This addresses the scalability and quality issues in automated CoT generation for LLMs, improving multi-hop reasoning in open-domain question-answering, though it is incremental as it builds on existing CoT and LLM techniques.
The paper tackles the problem of generating high-quality chain-of-thought (CoT) prompts for large language models (LLMs) in open-domain multi-hop reasoning (ODMR), proposing Self-prompted Chain-of-Thought (SP-CoT) to automate this process, which significantly surpasses previous state-of-the-art methods on large-scale LLMs and nearly doubles zero-shot performance on small-scale LLMs.
In open-domain question-answering (ODQA), most existing questions require single-hop reasoning on commonsense. To further extend this task, we officially introduce open-domain multi-hop reasoning (ODMR) by answering multi-hop questions with explicit reasoning steps in open-domain setting. Recently, large language models (LLMs) have found significant utility in facilitating ODQA without external corpus. Furthermore, chain-of-thought (CoT) prompting boosts the reasoning capability of LLMs to a greater extent with manual or automated paradigms. However, existing automated methods lack of quality assurance, while manual approaches suffer from limited scalability and poor diversity, hindering the capabilities of LLMs. In this paper, we propose Self-prompted Chain-of-Thought (SP-CoT), an automated framework to mass-produce high quality CoTs of LLMs, by LLMs and for LLMs. SP-CoT introduces an automated generation pipeline of high quality ODMR datasets, an adaptive sampler for in-context CoT selection and self-prompted inference via in-context learning. Extensive experiments on four multi-hop question-answering benchmarks show that our proposed SP-CoT not only significantly surpasses the previous SOTA methods on large-scale (175B) LLMs, but also nearly doubles the zero-shot performance of small-scale (13B) LLMs. Further analysis reveals the remarkable capability of SP-CoT to elicit direct and concise intermediate reasoning steps by recalling $\sim$50\% of intermediate answers on MuSiQue-Ans dataset.