CL AI LGOct 14, 2024

Graph of Records: Boosting Retrieval Augmented Generation for Long-context Summarization with Graphs

arXiv:2410.11001v27.713 citationsh-index: 10Has CodeACL

Originality Incremental advance

AI Analysis

This work addresses the challenge of improving summarization accuracy for long documents in natural language processing, though it is incremental as it builds on existing RAG and graph-based techniques.

The paper tackles the problem of suboptimal summarization in retrieval-augmented generation (RAG) by leveraging historical LLM-generated responses, which are often discarded, to enhance performance for long-context global summarization. The proposed Graph of Records (GoR) method achieves improvements of up to 19% in Rouge scores on datasets like WCEP compared to baseline retrievers.

Retrieval-augmented generation (RAG) has revitalized Large Language Models (LLMs) by injecting non-parametric factual knowledge. Compared with long-context LLMs, RAG is considered an effective summarization tool in a more concise and lightweight manner, which can interact with LLMs multiple times using diverse queries to get comprehensive responses. However, the LLM-generated historical responses, which contain potentially insightful information, are largely neglected and discarded by existing approaches, leading to suboptimal results. In this paper, we propose $\textit{graph of records}$ ($\textbf{GoR}$), which leverages historical responses generated by LLMs to enhance RAG for long-context global summarization. Inspired by the $\textit{retrieve-then-generate}$ paradigm of RAG, we construct a graph by establishing an edge between the retrieved text chunks and the corresponding LLM-generated response. To further uncover the intricate correlations between them, GoR features a $\textit{graph neural network}$ and an elaborately designed $\textit{BERTScore}$-based objective for self-supervised model training, enabling seamless supervision signal backpropagation between reference summaries and node embeddings. We comprehensively compare GoR with 12 baselines across four long-context summarization datasets, and the results indicate that our proposed method reaches the best performance ($\textit{e.g.}$, 15%, 8%, and 19% improvement over retrievers w.r.t. Rouge-L, Rouge-1, and Rouge-2 on the WCEP dataset). Extensive experiments further demonstrate the effectiveness of GoR.

View on arXiv PDF Code

Similar