AIApr 2

Do Large Language Models Mentalize When They Teach?

Sevan K. Harootonian, Mark K. Ho, Thomas L. Griffiths, Yael Niv, Ilia Sucholutsky

arXiv:2604.0159462.8h-index: 13

Predicted impact top 60% in AI · last 90 daysOriginality Synthesis-oriented

AI Analysis

This addresses the problem of understanding LLM teaching strategies for AI researchers, but it is incremental as it applies existing cognitive models to LLMs without new methods.

The study investigated whether large language models (LLMs) reason about learners' knowledge when teaching, by testing them in a controlled task where they reveal edges in a graph to improve a learner's path, and found that most LLMs performed similarly to humans, with Bayes-Optimal teaching best explaining their choices, though scaffolding interventions did not reliably improve performance.

How do LLMs decide what to teach next: by reasoning about a learner's knowledge, or by using simpler rules of thumb? We test this in a controlled task previously used to study human teaching strategies. On each trial, a teacher LLM sees a hypothetical learner's trajectory through a reward-annotated directed graph and must reveal a single edge so the learner would choose a better path if they replanned. We run a range of LLMs as simulated teachers and fit their trial-by-trial choices with the same cognitive models used for humans: a Bayes-Optimal teacher that infers which transitions the learner is missing (inverse planning), weaker Bayesian variants, heuristic baselines (e.g., reward based), and non-mentalizing utility models. In a baseline experiment matched to the stimuli presented to human subjects, most LLMs perform well, show little change in strategy over trials, and their graph-by-graph performance is similar to that of humans. Model comparison (BIC) shows that Bayes-Optimal teaching best explains most models' choices. When given a scaffolding intervention, models follow auxiliary inference- or reward-focused prompts, but these scaffolds do not reliably improve later teaching on heuristic-incongruent test graphs and can sometimes reduce performance. Overall, cognitive model fits provide insight into LLM tutoring policies and show that prompt compliance does not guarantee better teaching decisions.

View on arXiv PDF

Similar