CL CY IRAug 21, 2024

Ancient Wisdom, Modern Tools: Exploring Retrieval-Augmented LLMs for Ancient Indian Philosophy

arXiv:2408.11903v226 citationsh-index: 1

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of applying LLMs to specialized, long-tail knowledge domains like ancient Indian philosophy, which is incremental as it adapts existing RAG methods to a new dataset.

The paper tackled the problem of factual inaccuracies and hallucinations in LLMs for specialized domains by developing a retrieval-augmented generation (RAG) model for long-form question answering on ancient Indian philosophy, using the VedantaNY-10M dataset, and found that it significantly outperformed a standard non-RAG LLM in producing factual and comprehensive responses with fewer hallucinations.

LLMs have revolutionized the landscape of information retrieval and knowledge dissemination. However, their application in specialized areas is often hindered by factual inaccuracies and hallucinations, especially in long-tail knowledge distributions. We explore the potential of retrieval-augmented generation (RAG) models for long-form question answering (LFQA) in a specialized knowledge domain. We present VedantaNY-10M, a dataset curated from extensive public discourses on the ancient Indian philosophy of Advaita Vedanta. We develop and benchmark a RAG model against a standard, non-RAG LLM, focusing on transcription, retrieval, and generation performance. Human evaluations by computational linguists and domain experts show that the RAG model significantly outperforms the standard model in producing factual and comprehensive responses having fewer hallucinations. In addition, a keyword-based hybrid retriever that emphasizes unique low-frequency terms further improves results. Our study provides insights into effectively integrating modern large language models with ancient knowledge systems. Project page with dataset and code: https://sites.google.com/view/vedantany-10m

View on arXiv PDF

Similar