AIJan 29

Language-based Trial and Error Falls Behind in the Era of Experience

arXiv:2601.21754v21 citationsh-index: 35
Originality Highly original
AI Analysis

This addresses the bottleneck of exploration costs for LLMs in nonlinguistic environments, offering a more efficient method for such tasks.

The paper tackles the problem of LLMs struggling with unseen nonlinguistic tasks due to high exploration costs, and proposes SCOUT, a framework using lightweight scouts for exploration and fine-tuning, enabling a Qwen2.5-3B-Instruct model to achieve an average score of 0.86, outperforming proprietary models like Gemini-2.5-Pro (0.60) and saving 60% GPU hours.

While Large Language Models (LLMs) excel in language-based agentic tasks, their applicability to unseen, nonlinguistic environments (e.g., symbolic or spatial tasks) remains limited. Previous work attributes this performance gap to the mismatch between the pretraining distribution and the testing distribution. In this work, we demonstrate the primary bottleneck is the prohibitive cost of exploration: mastering these tasks requires extensive trial-and-error, which is computationally unsustainable for parameter-heavy LLMs operating in a high dimensional semantic space. To address this, we propose SCOUT (Sub-Scale Collaboration On Unseen Tasks), a novel framework that decouples exploration from exploitation. We employ lightweight "scouts" (e.g., small MLPs) to probe environmental dynamics at a speed and scale far exceeding LLMs. The collected trajectories are utilized to bootstrap the LLM via Supervised Fine-Tuning (SFT), followed by multi-turn Reinforcement Learning (RL) to activate its latent world knowledge. Empirically, SCOUT enables a Qwen2.5-3B-Instruct model to achieve an average score of 0.86, significantly outperforming proprietary models, including Gemini-2.5-Pro (0.60), while saving about 60% GPU hours consumption.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes