Comparative reversal learning reveals rigid adaptation in LLMs under non-stationary uncertainty
This work addresses the challenge of evaluating LLMs in non-stationary environments for AI researchers, though it is incremental as it applies existing methods to new models and tasks.
The study tackled the problem of how large language models (LLMs) adapt to changing environments in a reversal-learning task, finding that LLMs like DeepSeek-V3.2, Gemini-3, and GPT-5.2 showed rigid adaptation with asymmetric use of positive versus negative evidence, such as near-ceiling win-stay behavior and attenuated lose-shift compared to humans.
Non-stationary environments require agents to revise previously learned action values when contingencies change. We treat large language models (LLMs) as sequential decision policies in a two-option probabilistic reversal-learning task with three latent states and switch events triggered by either a performance criterion or timeout. We compare a deterministic fixed transition cycle to a stochastic random schedule that increases volatility, and evaluate DeepSeek-V3.2, Gemini-3, and GPT-5.2, with human data as a behavioural reference. Across models, win-stay was near ceiling while lose-shift was markedly attenuated, revealing asymmetric use of positive versus negative evidence. DeepSeek-V3.2 showed extreme perseveration after reversals and weak acquisition, whereas Gemini-3 and GPT-5.2 adapted more rapidly but still remained less loss-sensitive than humans. Random transitions amplified reversal-specific persistence across LLMs yet did not uniformly reduce total wins, demonstrating that high aggregate payoff can coexist with rigid adaptation. Hierarchical reinforcement-learning (RL) fits indicate dissociable mechanisms: rigidity can arise from weak loss learning, inflated policy determinism, or value polarisation via counterfactual suppression. These results motivate reversal-sensitive diagnostics and volatility-aware models for evaluating LLMs under non-stationary uncertainty.