CLIRDec 10, 2024

RAG-based Question Answering over Heterogeneous Data and Text

arXiv:2412.07420v113 citationsh-index: 8IEEE Data Engineering Bulletin
Originality Incremental advance
AI Analysis

This work addresses the challenge of efficient and accurate question answering across diverse data types for applications requiring low resource usage, though it is incremental as it builds on existing RAG architectures.

The paper tackles the problem of question answering over heterogeneous data sources (text, tables, knowledge graphs) by introducing the QUASAR system, which achieves high answering quality comparable to or better than large GPT models while reducing computational cost and energy consumption by orders of magnitude.

This article presents the QUASAR system for question answering over unstructured text, structured tables, and knowledge graphs, with unified treatment of all sources. The system adopts a RAG-based architecture, with a pipeline of evidence retrieval followed by answer generation, with the latter powered by a moderate-sized language model. Additionally and uniquely, QUASAR has components for question understanding, to derive crisper input for evidence retrieval, and for re-ranking and filtering the retrieved evidence before feeding the most informative pieces into the answer generation. Experiments with three different benchmarks demonstrate the high answering quality of our approach, being on par with or better than large GPT models, while keeping the computational cost and energy consumption orders of magnitude lower.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes