CLJun 2, 2025

Growing Through Experience: Scaling Episodic Grounding in Language Models

arXiv:2506.01312v16 citationsh-index: 30ACL
Originality Incremental advance
AI Analysis

This addresses scalability and integration challenges in episodic grounding for medium-sized LMs, enabling better physical planning and question-answering, though it appears incremental as it builds on existing methods like Monte Carlo tree search and distillation.

The paper tackles the problem of scaling episodic grounding in language models for physical planning tasks, proposing a weak-to-strong learning framework that transfers episodic behaviors from smaller to larger models, resulting in a 3.45% performance improvement over state-of-the-art proprietary LMs across diverse tasks.

Language models (LMs) require robust episodic grounding-the capacity to learn from and apply past experiences-to excel at physical planning tasks. Current episodic grounding approaches struggle with scalability and integration, limiting their effectiveness, especially for medium-sized LMs (7B parameters). While larger LMs (70-405B parameters) possess superior hierarchical representations and extensive pre-trained knowledge, they encounter a fundamental scale paradox: despite their advanced abstraction capabilities, they lack efficient mechanisms to leverage experience streams. We propose a scalable weak-to-strong episodic learning framework that effectively transfers episodic behaviors from smaller to larger LMs. This framework integrates Monte Carlo tree search for structured experience collection with a novel distillation method, preserving the inherent LM capabilities while embedding episodic memory. Experiments demonstrate our method surpasses state-of-the-art proprietary LMs by 3.45% across diverse planning and question-answering tasks. Layer-wise probing further indicates significant improvements in task alignment, especially within deeper LM layers, highlighting stable generalization even for previously unseen scenarios with increased planning complexity-conditions where baseline methods degrade markedly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes