CLJun 21, 2024

How language models extrapolate outside the training data: A case study in Textualized Gridworld

arXiv:2406.15275v48 citations
Originality Incremental advance
AI Analysis

This addresses the problem of language models' limited extrapolation capabilities for researchers in AI and cognitive science, offering a novel framework but with incremental improvements in a specific domain.

The study investigated language models' ability to extrapolate learned behaviors to novel, more complex environments using a textualized Gridworld path planning task, finding that conventional methods like next token prediction and Chain of Thought finetuning failed, while a proposed cognitive maps framework enhanced extrapolation and exhibited humanlike characteristics.

Language models' ability to extrapolate learned behaviors to novel, more complex environments beyond their training scope is highly unknown. This study introduces a path planning task in a textualized Gridworld to probe language models' extrapolation capabilities. We show that conventional approaches, including next token prediction and Chain of Thought (CoT) finetuning, fail to extrapolate in larger, unseen environments. Inspired by human cognition and dual process theory, we propose cognitive maps for path planning, a novel CoT framework that simulates humanlike mental representations. Our experiments show that cognitive maps not only enhance extrapolation to unseen environments but also exhibit humanlike characteristics through structured mental simulation and rapid adaptation. Our finding that these cognitive maps require specialized training schemes and cannot be induced through simple prompting opens up important questions about developing general-purpose cognitive maps in language models. Our comparison with exploration-based methods further illuminates the complementary strengths of offline planning and online exploration.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes