CLMay 29, 2025

Data-efficient Meta-models for Evaluation of Context-based Questions and Answers in LLMs

Julia Belikova, Konstantin Polev, Rauf Parchiev, Dmitry Simakov

arXiv:2505.23299v14.91 citationsh-index: 4SIGIR

Originality Incremental advance

AI Analysis

This work addresses scalability issues for industrial deployment in annotation-constrained scenarios, but it is incremental as it builds on existing SOTA frameworks.

The paper tackled the problem of data annotation bottlenecks in hallucination detection for LLMs and RAG systems by proposing a method to reduce training data requirements, achieving performance comparable to proprietary baselines with only 250 training samples.

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly deployed in industry applications, yet their reliability remains hampered by challenges in detecting hallucinations. While supervised state-of-the-art (SOTA) methods that leverage LLM hidden states -- such as activation tracing and representation analysis -- show promise, their dependence on extensively annotated datasets limits scalability in real-world applications. This paper addresses the critical bottleneck of data annotation by investigating the feasibility of reducing training data requirements for two SOTA hallucination detection frameworks: Lookback Lens, which analyzes attention head dynamics, and probing-based approaches, which decode internal model representations. We propose a methodology combining efficient classification algorithms with dimensionality reduction techniques to minimize sample size demands while maintaining competitive performance. Evaluations on standardized question-answering RAG benchmarks show that our approach achieves performance comparable to strong proprietary LLM-based baselines with only 250 training samples. These results highlight the potential of lightweight, data-efficient paradigms for industrial deployment, particularly in annotation-constrained scenarios.

View on arXiv PDF

Similar