Textual Planning with Explicit Latent Transitions
This addresses latency and compute bottlenecks in textual planning for AI systems, but it is incremental as it builds on existing embedding methods without solving cross-domain transfer.
The paper tackled the problem of expensive token-by-token generation and repeated forward passes in LLM-based planning by proposing EmbedPlan, which uses a lightweight transition model in a frozen embedding space to predict next-state embeddings, achieving near-perfect interpolation performance but showing sharp degradation in cross-domain generalization.
Planning with LLMs is bottlenecked by token-by-token generation and repeated full forward passes, making multi-step lookahead and rollout-based search expensive in latency and compute. We propose EmbedPlan, which replaces autoregressive next-state generation with a lightweight transition model operating in a frozen language embedding space. EmbedPlan encodes natural language state and action descriptions into vectors, predicts the next-state embedding, and retrieves the next state by nearest-neighbor similarity, enabling fast planning computation without fine-tuning the encoder. We evaluate next-state prediction across nine classical planning domains using six evaluation protocols of increasing difficulty: interpolation, plan-variant, extrapolation, multi-domain, cross-domain, and leave-one-out. Results show near-perfect interpolation performance but a sharp degradation when generalization requires transfer to unseen problems or unseen domains; plan-variant evaluation indicates generalization to alternative plans rather than memorizing seen trajectories. Overall, frozen embeddings support within-domain dynamics learning after observing a domain's transitions, while transfer across domain boundaries remains a bottleneck.