IRLGMar 1

Tiny-Critic RAG: Empowering Agentic Fallback with Parameter-Efficient Small Language Models

arXiv:2603.00846v12 citationsh-index: 7
Originality Incremental advance
AI Analysis

This addresses high computational costs and latency in autonomous agent systems, offering a cost-effective solution for deployment, though it is incremental as it builds on existing RAG frameworks.

The paper tackles the computational inefficiency of using large language models for binary routing in agentic RAG systems by proposing Tiny-Critic RAG, which uses a parameter-efficient small language model to achieve routing accuracy comparable to GPT-4o-mini while reducing latency by an order of magnitude.

Retrieval-Augmented Generation (RAG) grounds Large Language Models (LLMs) to mitigate factual hallucinations. Recent paradigms shift from static pipelines to Modular and Agentic RAG frameworks, granting models autonomy for multi-hop reasoning or self-correction. However, current reflective RAG heavily relies on massive LLMs as universal evaluators. In high-throughput systems, executing complete forward passes for billion-parameter models merely for binary routing introduces severe computational redundancy. Furthermore, in autonomous agent scenarios, inaccurate retrieval causes models to expend excessive tokens on spurious reasoning and redundant tool calls, inflating Time-to-First-Token (TTFT) and costs. We propose Tiny-Critic RAG, decoupling evaluation by deploying a parameter-efficient Small Language Model (SLM) via Low-Rank Adaptation (LoRA). Acting as a deterministic gatekeeper, Tiny-Critic employs constrained decoding and non-thinking inference modes for ultra-low latency binary routing. Evaluations on noise-injected datasets demonstrate Tiny-Critic RAG achieves routing accuracy comparable to GPT-4o-mini while reducing latency by an order of magnitude, establishing a highly cost-effective paradigm for agent deployment.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes