CLAIFeb 21

Think$^{2}$: Grounded Metacognitive Reasoning in Large Language Models

arXiv:2602.18806v11 citations
Originality Incremental advance
AI Analysis

This addresses the issue of unreliable reasoning and error handling in AI systems for users requiring robust and transparent AI, though it is incremental as it builds on existing cognitive theories.

The paper tackled the problem of limited error monitoring and self-correction in large language models by introducing a psychologically grounded metacognitive framework based on Ann Brown's regulatory cycle, resulting in a threefold increase in successful self-correction and an 84% human preference for trustworthiness over baselines.

Large Language Models (LLMs) demonstrate strong reasoning performance, yet their ability to reliably monitor, diagnose, and correct their own errors remains limited. We introduce a psychologically grounded metacognitive framework that operationalizes Ann Brown's regulatory cycle (Planning, Monitoring, and Evaluation) as a structured prompting architecture, and study its integration within a lightweight dual-process MetaController for adaptive effort allocation. Across diverse reasoning and diagnostic benchmarks (GSM8K, CRUXEval, MBPP, AIME, CorrectBench, and TruthfulQA) using Llama-3 and Qwen-3 (8B), explicit regulatory structuring substantially improves error diagnosis and yields a threefold increase in successful self-correction. Blinded human evaluations over 580 query pairs show an 84% aggregate preference for trustworthiness and metacognitive self-awareness over standard and Chain-of-Thought baselines. Grounding LLM reasoning in established cognitive theory offers a principled path toward more transparent and diagnostically robust AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes