LGJan 29

Failing to Explore: Language Models on Interactive Tasks

Mahdi JafariRaviz, Keivan Rezaei, Arshia Soltani Moakhar, Zahra Sodagar, Yize Cheng, Soheil Feizi

arXiv:2601.22345v11.4h-index: 49

Originality Incremental advance

AI Analysis

This addresses a critical limitation in language models for interactive tasks, which is incremental as it builds on existing evaluation methods.

The paper tackled the problem of language models' inability to effectively explore interactive environments under limited interaction budgets, finding that state-of-the-art models systematically under-explore and perform worse than simple heuristic baselines, with performance scaling weakly as the budget increases.

We evaluate language models on their ability to explore interactive environments under a limited interaction budget. We introduce three parametric tasks with controllable exploration difficulty, spanning continuous and discrete environments. Across state-of-the-art models, we find systematic under-exploration and suboptimal solutions, with performance often significantly worse than simple explore--exploit heuristic baselines and scaling weakly as the budget increases. Finally, we study two lightweight interventions: splitting a fixed budget into parallel executions, which surprisingly improves performance despite a no-gain theoretical result for our tasks, and periodically summarizing the interaction history, which preserves key discoveries and further improves exploration.

View on arXiv PDF

Similar