AI CLDec 10, 2025

SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments

Haoye Lu, Pavan Seshadri, Kaheer Suleman

arXiv:2512.09897v13.3h-index: 5

Originality Incremental advance

AI Analysis

This addresses efficiency issues in hierarchical planning for text-based environments, though it is incremental as it builds on existing LLM distillation approaches.

The paper tackles the problem of computationally expensive LLM-based hierarchical planning in text environments by introducing SCOPE, a one-shot method that uses LLM-generated subgoals only at initialization to pretrain a lightweight student model. The result is a 0.56 success rate (vs. 0.52 for baseline) and a reduction in inference time from 164.4 seconds to 3.0 seconds on the TextCraft environment.

Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and sparse feedback. Recent research suggests that large language models (LLMs) encode rich semantic knowledge about the world, which can be valuable for guiding agents in high-level reasoning and planning across both embodied and purely textual settings. However, existing approaches often depend heavily on querying LLMs during training and inference, making them computationally expensive and difficult to deploy efficiently. In addition, these methods typically employ a pretrained, unaltered LLM whose parameters remain fixed throughout training, providing no opportunity for adaptation to the target task. To address these limitations, we introduce SCOPE (Subgoal-COnditioned Pretraining for Efficient planning), a one-shot hierarchical planner that leverages LLM-generated subgoals only at initialization to pretrain a lightweight student model. Unlike prior approaches that distill LLM knowledge by repeatedly prompting the model to adaptively generate subgoals during training, our method derives subgoals directly from example trajectories. This design removes the need for repeated LLM queries, significantly improving efficiency, though at the cost of reduced explainability and potentially suboptimal subgoals. Despite their suboptimality, our results on the TextCraft environment show that LLM-generated subgoals can still serve as a strong starting point for hierarchical goal decomposition in text-based planning tasks. Compared to the LLM-based hierarchical agent ADaPT (Prasad et al., 2024), which achieves a 0.52 success rate, our method reaches 0.56 and reduces inference time from 164.4 seconds to just 3.0 seconds.

View on arXiv PDF

Similar