CL MLJun 2, 2025

MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations

arXiv:2506.01367v32.7h-index: 4

Originality Incremental advance

AI Analysis

This addresses the critical issue of unreliable LLM outputs for applications requiring factual accuracy, though it is an incremental improvement over existing detection methods.

The paper tackles the problem of detecting hallucinations in large language models by proposing MMD-Flagger, a method that uses Maximum Mean Discrepancy to track output variations with temperature changes, and it shows competitive performance on machine translation and summarization datasets.

Large language models (LLMs) have become pervasive in our everyday life. Yet, a fundamental obstacle prevents their use in many critical applications: their propensity to generate fluent, human-quality content that is not grounded in reality. The detection of such hallucinations is thus of the highest importance. In this work, we propose a new method to flag hallucinated content: MMD-Flagger. It relies on Maximum Mean Discrepancy (MMD), a non-parametric distance between distributions. On a high-level perspective, MMD-Flagger tracks the MMD between the output to inspect and counterparts generated with various temperature parameters. We show empirically that inspecting the shape of this trajectory is sufficient to detect most hallucinations. This novel method is benchmarked on machine translation and summarization datasets, on which it exhibits competitive performance relative to natural competitors.

View on arXiv PDF

Similar