CLSep 18, 2025

Quantifying Self-Awareness of Knowledge in Large Language Models

arXiv:2509.15339v11 citationsh-index: 13

Originality Incremental advance

AI Analysis

This work addresses the challenge of accurately assessing self-awareness in LLMs for researchers and practitioners, though it is incremental in improving existing methods.

The paper tackles the problem of distinguishing true self-awareness from question-side shortcuts in hallucination prediction for large language models, and finds that much reported success stems from superficial patterns; it introduces SCAO, which enhances model-side signals and achieves strong performance, especially with reduced cues.

Hallucination prediction in large language models (LLMs) is often interpreted as a sign of self-awareness. However, we argue that such performance can arise from question-side shortcuts rather than true model-side introspection. To disentangle these factors, we propose the Approximate Question-side Effect (AQE), which quantifies the contribution of question-awareness. Our analysis across multiple datasets reveals that much of the reported success stems from exploiting superficial patterns in questions. We further introduce SCAO (Semantic Compression by Answering in One word), a method that enhances the use of model-side signals. Experiments show that SCAO achieves strong and consistent performance, particularly in settings with reduced question-side cues, highlighting its effectiveness in fostering genuine self-awareness in LLMs.

View on arXiv PDF

Similar