IRAICLLGFeb 21, 2025

On Synthesizing Data for Context Attribution in Question Answering

arXiv:2504.05317v22 citationsh-index: 16ACL
Originality Incremental advance
AI Analysis

This addresses the issue of hallucinations in LLMs for question answering, improving trustworthiness, but it is incremental as it builds on existing LLM-based approaches.

The paper tackled the problem of LLMs producing false or misleading responses in question answering by developing a method to ground answers in provided context, and showed that fine-tuning small LMs on synthetic data generated using their SynQA strategy is highly effective for context attribution across different tasks and domains.

Question Answering (QA) accounts for a significant portion of LLM usage "in the wild". However, LLMs sometimes produce false or misleading responses, also known as "hallucinations". Therefore, grounding the generated answers in contextually provided information -- i.e., providing evidence for the generated text -- is paramount for LLMs' trustworthiness. Providing this information is the task of context attribution. In this paper, we systematically study LLM-based approaches for this task, namely we investigate (i) zero-shot inference, (ii) LLM ensembling, and (iii) fine-tuning of small LMs on synthetic data generated by larger LLMs. Our key contribution is SynQA: a novel generative strategy for synthesizing context attribution data. Given selected context sentences, an LLM generates QA pairs that are supported by these sentences. This leverages LLMs' natural strengths in text generation while ensuring clear attribution paths in the synthetic training data. We show that the attribution data synthesized via SynQA is highly effective for fine-tuning small LMs for context attribution in different QA tasks and domains. Finally, with a user study, we validate the usefulness of small LMs (fine-tuned on synthetic data from SynQA) in context attribution for QA.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes