Classifying and Addressing the Diversity of Errors in Retrieval-Augmented Generation Systems
This work addresses the need for robust deployment of RAG systems by providing tools to classify and mitigate errors, though it is incremental as it builds on existing RAG frameworks.
The paper tackles the problem of erroneous outputs in retrieval-augmented generation (RAG) systems by presenting a new taxonomy of error types, a curated dataset of annotated errors, and an auto-evaluation method to track and address these errors in practice.
Retrieval-augmented generation (RAG) is a prevalent approach for building LLM-based question-answering systems that can take advantage of external knowledge databases. Due to the complexity of real-world RAG systems, there are many potential causes for erroneous outputs. Understanding the range of errors that can occur in practice is crucial for robust deployment. We present a new taxonomy of the error types that can occur in realistic RAG systems, examples of each, and practical advice for addressing them. Additionally, we curate a dataset of erroneous RAG responses annotated by error types. We then propose an auto-evaluation method aligned with our taxonomy that can be used in practice to track and address errors during development. Code and data are available at https://github.com/layer6ai-labs/rag-error-classification.