CLOct 3, 2025

Sample, Align, Synthesize: Graph-Based Response Synthesis with ConGrs

Sayan Ghosh, Shahzaib Saqib Warraich, Dhruv Tarsadiya, Gregory Yauney, Swabha Swayamdipta

AI2

arXiv:2510.03527v12.7h-index: 18

Originality Highly original

AI Analysis

This addresses the challenge of efficiently leveraging epistemic signals from multiple LM responses for tasks like generation and abstention, offering a flexible method with broad applications in AI.

The paper tackles the problem of synthesizing diverse long-form responses from language models by introducing Consensus Graphs (ConGrs), a DAG-based data structure that captures shared information and semantic variation across sampled responses. The result shows improvements such as up to 31% higher factual precision in biography generation, over 80% reduced reliance on LM judges, up to 56% increased abstention rates in refusal tasks, and up to 6-point accuracy gains in reasoning tasks.

Language models can be sampled multiple times to access the distribution underlying their responses, but existing methods cannot efficiently synthesize rich epistemic signals across different long-form responses. We introduce Consensus Graphs (ConGrs), a flexible DAG-based data structure that represents shared information, as well as semantic variation in a set of sampled LM responses to the same prompt. We construct ConGrs using a light-weight lexical sequence alignment algorithm from bioinformatics, supplemented by the targeted usage of a secondary LM judge. Further, we design task-dependent decoding methods to synthesize a single, final response from our ConGr data structure. Our experiments show that synthesizing responses from ConGrs improves factual precision on two biography generation tasks by up to 31% over an average response and reduces reliance on LM judges by more than 80% compared to other methods. We also use ConGrs for three refusal-based tasks requiring abstention on unanswerable queries and find that abstention rate is increased by up to 56%. We apply our approach to the MATH and AIME reasoning tasks and find an improvement over self-verification and majority vote baselines by up to 6 points of accuracy. We show that ConGrs provide a flexible method for capturing variation in LM responses and using the epistemic signals provided by response variation to synthesize more effective responses.

View on arXiv PDF

Similar