CLNov 14, 2025

Can LLMs Detect Their Own Hallucinations?

Sora Kadotani, Kosuke Nishida, Kyosuke Nishida

arXiv:2511.11087v12.7h-index: 8

Originality Incremental advance

AI Analysis

This addresses the issue of hallucination detection for users relying on LLM outputs, but it is incremental as it builds on existing methods like Chain-of-Thought.

The paper tackled the problem of whether large language models (LLMs) can detect their own hallucinations, and found that GPT-3.5 Turbo with Chain-of-Thought achieved a detection rate of 58.2%.

Large language models (LLMs) can generate fluent responses, but sometimes hallucinate facts. In this paper, we investigate whether LLMs can detect their own hallucinations. We formulate hallucination detection as a classification task of a sentence. We propose a framework for estimating LLMs' capability of hallucination detection and a classification method using Chain-of-Thought (CoT) to extract knowledge from their parameters. The experimental results indicated that GPT-$3.5$ Turbo with CoT detected $58.2\%$ of its own hallucinations. We concluded that LLMs with CoT can detect hallucinations if sufficient knowledge is contained in their parameters.

View on arXiv PDF

Similar