CLAIIRJun 14, 2025

DoTA-RAG: Dynamic of Thought Aggregation RAG

arXiv:2506.12571v12 citationsh-index: 2
Originality Incremental advance
AI Analysis

This addresses the need for fast, reliable access to large and evolving knowledge sources in practical domains, representing a strong specific gain rather than a foundational advancement.

The paper tackles the problem of high latency and limited accuracy in retrieval-augmented generation systems over large-scale web knowledge indexes, achieving an improvement in answer correctness score from 0.752 to 1.478 while maintaining low latency.

In this paper, we introduce DoTA-RAG (Dynamic-of-Thought Aggregation RAG), a retrieval-augmented generation system optimized for high-throughput, large-scale web knowledge indexes. Traditional RAG pipelines often suffer from high latency and limited accuracy over massive, diverse datasets. DoTA-RAG addresses these challenges with a three-stage pipeline: query rewriting, dynamic routing to specialized sub-indexes, and multi-stage retrieval and ranking. We further enhance retrieval by evaluating and selecting a superior embedding model, re-embedding the large FineWeb-10BT corpus. Moreover, we create a diverse Q&A dataset of 500 questions generated via the DataMorgana setup across a broad range of WebOrganizer topics and formats. DoTA-RAG improves the answer correctness score from 0.752 (baseline, using LiveRAG pre-built vector store) to 1.478 while maintaining low latency, and it achieves a 0.929 correctness score on the Live Challenge Day. These results highlight DoTA-RAG's potential for practical deployment in domains requiring fast, reliable access to large and evolving knowledge sources.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes