CLApr 25, 2025

Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family

arXiv:2504.18225v11 citationsh-index: 7Has Code
Originality Incremental advance
AI Analysis

This work addresses the need for efficient and factual RAG models for deployment on constrained infrastructure, though it is incremental in building on existing RAG workflows.

The paper tackles the problem of improving retrieval-augmented generation (RAG) with small language models by introducing Pleias-RAG-350m and Pleias-RAG-1B, which outperform SLMs below 4B parameters on benchmarks like HotPotQA and 2wiki and are competitive with larger models such as Qwen-2.5-7B.

We introduce a new generation of small reasoning models for RAG, search, and source summarization. Pleias-RAG-350m and Pleias-RAG-1B are mid-trained on a large synthetic dataset emulating the retrieval of a wide variety of multilingual open sources from the Common Corpus. They provide native support for citation and grounding with literal quotes and reintegrate multiple features associated with RAG workflows, such as query routing, query reformulation, and source reranking. Pleias-RAG-350m and Pleias-RAG-1B outperform SLMs below 4 billion parameters on standardized RAG benchmarks (HotPotQA, 2wiki) and are competitive with popular larger models, including Qwen-2.5-7B, Llama-3.1-8B, and Gemma-3-4B. They are the only SLMs to date maintaining consistent RAG performance across leading European languages and ensuring systematic reference grounding for statements. Due to their size and ease of deployment on constrained infrastructure and higher factuality by design, the models unlock a range of new use cases for generative AI.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes