HCAIMay 7, 2024

Generative AI as a metacognitive agent: A comparative mixed-method study with human participants on ICF-mimicking exam performance

arXiv:2405.05285v14 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of evaluating AI's metacognitive abilities for coaching competency assessment, with incremental implications for developing AI simulators and more autonomous systems.

This study compared the metacognitive capabilities of five advanced large language models (LLMs) with humans on an International Coaching Federation exam, finding that LLMs outperformed humans across all metrics, such as reduced overconfidence, though both showed limited adaptability in ambiguous scenarios.

This study investigates the metacognitive capabilities of Large Language Models relative to human metacognition in the context of the International Coaching Federation ICF mimicking exam, a situational judgment test related to coaching competencies. Using a mixed method approach, we assessed the metacognitive performance, including sensitivity, accuracy in probabilistic predictions, and bias, of human participants and five advanced LLMs (GPT-4, Claude-3-Opus 3, Mistral Large, Llama 3, and Gemini 1.5 Pro). The results indicate that LLMs outperformed humans across all metacognitive metrics, particularly in terms of reduced overconfidence, compared to humans. However, both LLMs and humans showed less adaptability in ambiguous scenarios, adhering closely to predefined decision frameworks. The study suggests that Generative AI can effectively engage in human-like metacognitive processing without conscious awareness. Implications of the study are discussed in relation to development of AI simulators that scaffold cognitive and metacognitive aspects of mastering coaching competencies. More broadly, implications of these results are discussed in relation to development of metacognitive modules that lead towards more autonomous and intuitive AI systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes