LG CLNov 20, 2024

Retrieval-Augmented Generation for Domain-Specific Question Answering: A Case Study on Pittsburgh and CMU

arXiv:2411.13691v14.67 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This is an incremental improvement for enhancing answer precision in domain-specific question answering systems.

The researchers tackled domain-specific question answering about Pittsburgh and CMU by developing a Retrieval-Augmented Generation (RAG) system, which improved F1 score from 5.45% to 42.21% and achieved 56.18% recall, outperforming a non-RAG baseline.

We designed a Retrieval-Augmented Generation (RAG) system to provide large language models with relevant documents for answering domain-specific questions about Pittsburgh and Carnegie Mellon University (CMU). We extracted over 1,800 subpages using a greedy scraping strategy and employed a hybrid annotation process, combining manual and Mistral-generated question-answer pairs, achieving an inter-annotator agreement (IAA) score of 0.7625. Our RAG framework integrates BM25 and FAISS retrievers, enhanced with a reranker for improved document retrieval accuracy. Experimental results show that the RAG system significantly outperforms a non-RAG baseline, particularly in time-sensitive and complex queries, with an F1 score improvement from 5.45% to 42.21% and recall of 56.18%. This study demonstrates the potential of RAG systems in enhancing answer precision and relevance, while identifying areas for further optimization in document retrieval and model training.

View on arXiv PDF

Similar