CLMay 27

Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents

arXiv:2605.2846583.8
AI Analysis

For researchers and developers of interactive LLM agents, this work addresses a critical gap in evaluating and enhancing creative reasoning in iterative settings, offering a new benchmark and method that improves divergent thinking beyond existing single-turn evaluations.

The paper introduces MUTATE, an interactive benchmark for evaluating divergent thinking in LLM agents, and proposes ReDNA, a method that separates divergent generation from convergent selection. ReDNA significantly outperforms prior methods in both path-level and action-level divergence, and generalizes to an external creativity environment.

Divergent thinking is a core dimension of creativity, yet existing evaluations of Large Language Models (LLMs) treat them as single-turn text generations, failing to capture how an agent reasons through iterative interaction. To address this, we introduce MUTATE, an interactive benchmark designed to evaluate agentic divergent thinking at two levels: path-level, where an agent discovers multiple alternative paths to the same goal, and action-level, where individual actions require non-typical, mechanism-shifting object uses. Unlike success-only evaluations, MUTATE scores both completed paths and off-path attempts, capturing divergent reasoning that conventional success rates discard. Our experiments with frontier LLMs reveal a structural blind spot in existing frameworks: when exposed to immediate convergence pressure, they tend to fall into immediate action fixation, failing to improve action-level divergence. To overcome this, we propose ReDNA, which separates unconstrained divergent candidate generation from convergent constraint selection. ReDNA significantly outperforms prior methods across both divergence levels and generalizes effectively to an external creativity environment. We also confirm its success stems from a qualitative enhancement of resilient divergent reasoning rather than simple environmental exploration.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes