Observations on Building RAG Systems for Technical Documents
This work addresses challenges in RAG systems for technical documents, but it is incremental as it reviews and experiments with existing methods rather than introducing new ones.
The paper tackles the problem of building retrieval augmented generation (RAG) systems for technical documents, where embeddings often fail to capture domain information, by reviewing prior art and conducting experiments to identify best practices and challenges.
Retrieval augmented generation (RAG) for technical documents creates challenges as embeddings do not often capture domain information. We review prior art for important factors affecting RAG and perform experiments to highlight best practices and potential challenges to build RAG systems for technical documents.