CLMar 25

Language Model Planners do not Scale, but do Formalizers?

Owen Jiang, Cassie Huang, Ashish Sabharwal, Li Zhang

arXiv:2603.2384436.4h-index: 9

AI Analysis

This addresses scalability issues in AI planning for researchers and practitioners, offering a novel approach that is not incremental but introduces a new paradigm for formalization.

The paper tackles the problem of LLMs failing to scale in complex planning tasks by showing that LLM formalizers, which generate solver-oriented programs, outperform LLM planners, achieving perfect accuracy in BlocksWorld with state spaces up to 10^165. It introduces a divide-and-conquer technique to improve robustness and a new LLM-as-higher-order-formalizer paradigm to handle unraveling problems with exponential formal language complexity.

Recent work shows overwhelming evidence that LLMs, even those trained to scale their reasoning trace, perform unsatisfactorily when solving planning problems too complex. Whether the same conclusion holds for LLM formalizers that generate solver-oriented programs remains unknown. We systematically show that LLM formalizers greatly out-scale LLM planners, some retaining perfect accuracy in the classic BlocksWorld domain with a huge state space of size up to $10^{165}$. While performance of smaller LLM formalizers degrades with problem complexity, we show that a divide-and-conquer formalizing technique can greatly improve its robustness. Finally, we introduce unraveling problems where one line of problem description realistically corresponds to exponentially many lines of formal language such as the Planning Domain Definition Language (PDDL), greatly challenging LLM formalizers. We tackle this challenge by introducing a new paradigm, namely LLM-as-higher-order-formalizer, where an LLM generates a program generator. This decouples token output from the combinatorial explosion of the underlying formalization and search space.

View on arXiv PDF

Similar