CLJun 29, 2025

LLM-Assisted Question-Answering on Technical Documents Using Structured Data-Aware Retrieval Augmented Generation

arXiv:2506.23136v11 citationsh-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of accurate question-answering for users of technical documents, but it is incremental as it builds on existing RAG methods with specific enhancements for structured data.

The paper tackled the problem of LLMs hallucinating and having outdated knowledge when answering questions on technical documents with structured data like tables and images, by proposing a RAG pipeline that achieved faithfulness scores of 94-96% and answer relevancy scores of 87-93%.

Large Language Models (LLMs) are capable of natural language understanding and generation. But they face challenges such as hallucination and outdated knowledge. Fine-tuning is one possible solution, but it is resource-intensive and must be repeated with every data update. Retrieval-Augmented Generation (RAG) offers an efficient solution by allowing LLMs to access external knowledge sources. However, traditional RAG pipelines struggle with retrieving information from complex technical documents with structured data such as tables and images. In this work, we propose a RAG pipeline, capable of handling tables and images in documents, for technical documents that support both scanned and searchable formats. Its retrieval process combines vector similarity search with a fine-tuned reranker based on Gemma-2-9b-it. The reranker is trained using RAFT (Retrieval-Augmented Fine-Tuning) on a custom dataset designed to improve context identification for question answering. Our evaluation demonstrates that the proposed pipeline achieves a high faithfulness score of 94% (RAGas) and 96% (DeepEval), and an answer relevancy score of 87% (RAGas) and 93% (DeepEval). Comparative analysis demonstrates that the proposed architecture is superior to general RAG pipelines in terms of table-based questions and handling questions outside context.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes