AIIRFeb 26, 2025

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

arXiv:2502.19596v51 citationsh-index: 4CIKM
Originality Incremental advance
AI Analysis

This addresses the challenge for employees in industries like automotive to efficiently access domain-specific knowledge from large, complex internal documents, though it is incremental as it adapts existing RAG methods to enterprise constraints.

The paper tackles the problem of retrieving relevant information from heterogeneous proprietary corporate documents, such as automotive crash-collision test reports, by proposing a Retrieval-Augmented Generation (RAG)-based Question Answering (QA) framework that improves factual correctness, informativeness, and helpfulness over a non-RAG baseline, with gains of up to +1.94 on a 1-5 scale.

Proprietary corporate documents contain rich domain-specific knowledge, but their overwhelming volume and disorganized structure make it difficult even for employees to access the right information when needed. For example, in the automotive industry, vehicle crash-collision tests, each costing hundreds of thousands of dollars, produce highly detailed documentation. However, retrieving relevant content during decision-making remains time-consuming due to the scale and complexity of the material. While Retrieval-Augmented Generation (RAG)-based Question Answering (QA) systems offer a promising solution, building an internal RAG-QA system poses several challenges: (1) handling heterogeneous multi-modal data sources, (2) preserving data confidentiality, and (3) enabling traceability between each piece of information in the generated answer and its original source document. To address these, we propose a RAG-QA framework for internal enterprise use, consisting of: (1) a data pipeline that converts raw multi-modal documents into a structured corpus and QA pairs, (2) a fully on-premise, privacy-preserving architecture, and (3) a lightweight reference matcher that links answer segments to supporting content. Applied to the automotive domain, our system improves factual correctness (+1.79, +1.94), informativeness (+1.33, +1.16), and helpfulness (+1.08, +1.67) over a non-RAG baseline, based on 1-5 scale ratings from both human and LLM judge.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes