CLMay 24, 2023

Mastering the ABCDs of Complex Questions: Answer-Based Claim Decomposition for Fine-grained Self-Evaluation

arXiv:2305.14750v12 citations
Originality Incremental advance
AI Analysis

This addresses the issue of insufficient self-evaluation in LLMs for complex questions, though it appears incremental as it builds on existing self-evaluation techniques.

The paper tackles the problem of large language models (LLMs) producing answers that may not meet all criteria of complex questions by proposing answer-based claim decomposition (ABCD), a prompting strategy that decomposes questions into true/false claims for fine-grained self-evaluation. Preliminary experiments on three datasets, including ObscureQA, show that GPT-3.5 can determine the extent to which its answers satisfy question criteria and identify errors and knowledge gaps.

When answering complex questions, large language models (LLMs) may produce answers that do not satisfy all criteria of the question. While existing self-evaluation techniques aim to detect if such answers are correct, these techniques are unable to determine which criteria of the question are satisfied by the generated answers. To address this issue, we propose answer-based claim decomposition (ABCD), a prompting strategy that decomposes questions into a series of true/false claims that can be used to verify which criteria of the input question an answer satisfies. Using the decomposed ABCD claims, we perform fine-grained self-evaluation. Through preliminary experiments on three datasets, including a newly-collected challenge dataset ObscureQA, we find that GPT-3.5 has some ability to determine to what extent its answer satisfies the criteria of the input question, and can give insights into the errors and knowledge gaps of the model.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes