CLAILGJan 7

Disco-RAG: Discourse-Aware Retrieval-Augmented Generation

arXiv:2601.04377v1
Originality Highly original
AI Analysis

This addresses the limitation of existing RAG strategies for knowledge-intensive tasks by incorporating discourse structure, representing an incremental improvement in enhancing LLM performance.

The paper tackled the problem of flat and unstructured retrieval in Retrieval-Augmented Generation (RAG) systems, which limits knowledge synthesis from dispersed evidence, by proposing Disco-RAG, a discourse-aware framework that injects discourse signals into generation, achieving state-of-the-art results on question answering and long-document summarization benchmarks without fine-tuning.

Retrieval-Augmented Generation (RAG) has emerged as an important means of enhancing the performance of large language models (LLMs) in knowledge-intensive tasks. However, most existing RAG strategies treat retrieved passages in a flat and unstructured way, which prevents the model from capturing structural cues and constrains its ability to synthesize knowledge from dispersed evidence across documents. To overcome these limitations, we propose Disco-RAG, a discourse-aware framework that explicitly injects discourse signals into the generation process. Our method constructs intra-chunk discourse trees to capture local hierarchies and builds inter-chunk rhetorical graphs to model cross-passage coherence. These structures are jointly integrated into a planning blueprint that conditions the generation. Experiments on question answering and long-document summarization benchmarks show the efficacy of our approach. Disco-RAG achieves state-of-the-art results on the benchmarks without fine-tuning. These findings underscore the important role of discourse structure in advancing RAG systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes