AIDec 31, 2025
Iterative Deployment Improves Planning Skills in LLMsAugusto B. Corrêa, Yoav Gelberg, Luckeciano C. Melo et al. · deepmind
We show that iterative deployment of large language models (LLMs), each fine-tuned on data carefully curated by users from the previous models' deployment, can significantly change the properties of the resultant models. By testing this mechanism on various planning domains, we observe substantial improvements in planning skills, with later models displaying emergent generalization by discovering much longer plans than the initial models. We then provide theoretical analysis showing that iterative deployment effectively implements reinforcement learning (RL) training in the outer-loop (i.e. not as part of intentional model training), with an implicit reward function. The connection to RL has two important implications: first, for the field of AI safety, as the reward function entailed by repeated deployment is not defined explicitly, and could have unexpected implications to the properties of future model deployments. Second, the mechanism highlighted here can be viewed as an alternative training regime to explicit RL, relying on data curation rather than explicit rewards.
AIApr 8, 2022
Iterative Depth-First Search for Fully Observable Non-Deterministic PlanningRamon Fraga Pereira, André G. Pereira, Frederico Messa et al.
Fully Observable Non-Deterministic (FOND) planning models uncertainty through actions with non-deterministic effects. Existing FOND planning algorithms are effective and employ a wide range of techniques. However, most of the existing algorithms are not robust for dealing with both non-determinism and task size. In this paper, we develop a novel iterative depth-first search algorithm that solves FOND planning tasks and produces strong cyclic policies. Our algorithm is explicitly designed for FOND planning, addressing more directly the non-deterministic aspect of FOND planning, and it also exploits the benefits of heuristic functions to make the algorithm more effective during the iterative searching process. We compare our proposed algorithm to well-known FOND planners, and show that it has robust performance over several distinct types of FOND domains considering different metrics.
AIOct 8, 2023
"A Nova Eletricidade: Aplicações, Riscos e Tendências da IA Moderna -- "The New Electricity": Applications, Risks, and Trends in Current AIAna L. C. Bazzan, Anderson R. Tavares, André G. Pereira et al.
The thought-provoking analogy between AI and electricity, made by computer scientist and entrepreneur Andrew Ng, summarizes the deep transformation that recent advances in Artificial Intelligence (AI) have triggered in the world. This chapter presents an overview of the ever-evolving landscape of AI, written in Portuguese. With no intent to exhaust the subject, we explore the AI applications that are redefining sectors of the economy, impacting society and humanity. We analyze the risks that may come along with rapid technological progress and future trends in AI, an area that is on the path to becoming a general-purpose technology, just like electricity, which revolutionized society in the 19th and 20th centuries. A provocativa comparação entre IA e eletricidade, feita pelo cientista da computação e empreendedor Andrew Ng, resume a profunda transformação que os recentes avanços em Inteligência Artificial (IA) têm desencadeado no mundo. Este capítulo apresenta uma visão geral pela paisagem em constante evolução da IA. Sem pretensões de exaurir o assunto, exploramos as aplicações que estão redefinindo setores da economia, impactando a sociedade e a humanidade. Analisamos os riscos que acompanham o rápido progresso tecnológico e as tendências futuras da IA, área que trilha o caminho para se tornar uma tecnologia de propósito geral, assim como a eletricidade, que revolucionou a sociedade dos séculos XIX e XX.
AINov 12, 2025
The 2025 Planning Performance of Frontier Large Language ModelsAugusto B. Corrêa, André G. Pereira, Jendrik Seipp
The capacity of Large Language Models (LLMs) for reasoning remains an active area of research, with the capabilities of frontier models continually advancing. We provide an updated evaluation of the end-to-end planning performance of three frontier LLMs as of 2025, where models are prompted to generate a plan from PDDL domain and task descriptions. We evaluate DeepSeek R1, Gemini 2.5 Pro, GPT-5 and as reference the planner LAMA on a subset of domains from the most recent Learning Track of the International Planning Competition. Our results show that on standard PDDL domains, the performance of GPT-5 in terms of solved tasks is competitive with LAMA. When the PDDL domains and tasks are obfuscated to test for pure reasoning, the performance of all LLMs degrades, though less severely than previously reported for other models. These results show substantial improvements over prior generations of LLMs, reducing the performance gap to planners on a challenging benchmark.
AIMay 15
Property-Guided LLM Program Synthesis for PlanningAugusto B. Corrêa, André G. Pereira, Jendrik Seipp
LLMs have shown impressive success in program synthesis, discovering programs that surpass prior solutions. However, these approaches rely on simple numeric scores to signal program quality, such as the value of the solution or the number of passed tests. Because a score offers no guidance on why a program failed, the system must generate and evaluate many candidates hoping some succeed, increasing LLM inference and evaluation costs. We study a different approach: property-guided LLM program synthesis. Instead of scoring programs after evaluation, we check whether a candidate satisfies a formally defined property. When the property is violated, we stop the evaluation early and provide the LLM with a concrete counterexample showing exactly how the program failed. This feedback drastically reduces both the number of program generations and the evaluation cost, and can guide the LLM to generate stronger programs. We evaluate this approach on PDDL planning domains, asking the LLM to synthesize direct heuristic functions: every state reachable by strictly improving transitions has a strictly improving successor. A heuristic with this property leads hill-climbing algorithm directly to a goal state. A counterexample-guided repair loop generates one candidate program, checks the property over a training set, and returns the first case that violates the property. We evaluate our approach on ten planning domains with an out-of-distribution test set. The synthesized heuristics are effectively direct on virtually all test tasks, and compared to the best prior generation method our approach generates seven times fewer programs per domain on average, solves more tasks without using search, and requires several orders of magnitude less computation to evaluate candidates. Whenever a problem admits a verifiable property, property-guided LLM synthesis can reduce cost and improve program quality.
AIMar 24, 2025
Classical Planning with LLM-Generated Heuristics: Challenging the State of the Art with Python CodeAugusto B. Corrêa, André G. Pereira, Jendrik Seipp
In recent years, large language models (LLMs) have shown remarkable capabilities in various artificial intelligence problems. However, they fail to plan reliably, even when prompted with a detailed definition of the planning task. Attempts to improve their planning capabilities, such as chain-of-thought prompting, fine-tuning, and explicit "reasoning" still yield incorrect plans and usually fail to generalize to larger tasks. In this paper, we show how to use LLMs to generate correct plans, even for out-of-distribution tasks of increasing size. For a given planning domain, we ask an LLM to generate several domain-dependent heuristic functions in the form of Python code, evaluate them on a set of training tasks within a greedy best-first search, and choose the strongest one. The resulting LLM-generated heuristics solve many more unseen test tasks than state-of-the-art domain-independent heuristics for classical planning. They are even competitive with the strongest learning algorithm for domain-dependent planning. These findings are especially remarkable given that our proof-of-concept implementation is based on an unoptimized Python planner and the baselines all build upon highly optimized C++ code. In some domains, the LLM-generated heuristics expand fewer states than the baselines, revealing that they are not only efficiently computable, but sometimes even more informative than the state-of-the-art heuristics. Overall, our results show that sampling a set of planning heuristic function programs can significantly improve the planning capabilities of LLMs.
AIApr 11, 2024
Goal Recognition via Linear ProgrammingFelipe Meneguzzi, Luísa R. de A. Santos, Ramon Fraga Pereira et al.
Goal Recognition is the task by which an observer aims to discern the goals that correspond to plans that comply with the perceived behavior of subject agents given as a sequence of observations. Research on Goal Recognition as Planning encompasses reasoning about the model of a planning task, the observations, and the goals using planning techniques, resulting in very efficient recognition approaches. In this article, we design novel recognition approaches that rely on the Operator-Counting framework, proposing new constraints, and analyze their constraints' properties both theoretically and empirically. The Operator-Counting framework is a technique that efficiently computes heuristic estimates of cost-to-goal using Integer/Linear Programming (IP/LP). In the realm of theory, we prove that the new constraints provide lower bounds on the cost of plans that comply with observations. We also provide an extensive empirical evaluation to assess how the new constraints improve the quality of the solution, and we found that they are especially informed in deciding which goals are unlikely to be part of the solution. Our novel recognition approaches have two pivotal advantages: first, they employ new IP/LP constraints for efficiently recognizing goals; second, we show how the new IP/LP constraints can improve the recognition of goals under both partial and noisy observability.
AIJul 4, 2019
Procedural Generation of Initial States of SokobanDâmaris S. Bento, André G. Pereira, Levi H. S. Lelis
Procedural generation of initial states of state-space search problems have applications in human and machine learning as well as in the evaluation of planning systems. In this paper we deal with the task of generating hard and solvable initial states of Sokoban puzzles. We propose hardness metrics based on pattern database heuristics and the use of novelty to improve the exploration of search methods in the task of generating initial states. We then present a system called Beta that uses our hardness metrics and novelty to generate initial states. Experiments show that Beta is able to generate initial states that are harder to solve by a specialized solver than those designed by human experts.