Large Language Models Are Self-Taught Reasoners: Enhancing LLM Applications via Tailored Problem-Solving Demonstrations
This addresses the issue of high human effort and performance gaps in specialized domains like clinical diagnosis, though it is incremental as it builds on existing prompting methods.
The paper tackles the problem of costly and suboptimal human-authored demonstrations for guiding large language models by introducing SELF-TAUGHT, a framework that automatically creates tailored demonstrations, achieving superior performance in 15 diverse tasks including Alzheimer's disease diagnosis.
Guiding large language models with a selected set of human-authored demonstrations is a common practice for improving LLM applications. However, human effort can be costly, especially in specialized domains (e.g., clinical diagnosis), and does not guarantee optimal performance due to the potential discrepancy of target skills between selected demonstrations and real test instances. Motivated by these, this paper explores the automatic creation of customized demonstrations, whose target skills align with the given target instance. We present SELF-TAUGHT, a problem-solving framework, which facilitates demonstrations that are "tailored" to the target problem and "filtered" for better quality (i.e., correctness) in a zero-shot manner. In 15 tasks of multiple-choice questions of diverse domains and the diagnosis of Alzheimer's disease (AD) with real-world patients, SELF-TAUGHT achieves superior performance to strong baselines (e.g., Few-shot CoT, Plan-and-Solve, Auto-CoT). We conduct comprehensive analyses on SELF-TAUGHT, including its generalizability to existing prompting methods and different LLMs, the quality of its intermediate generation, and more.