GenPlan: Generative Sequence Models as Adaptive Planners
This addresses the problem of limited adaptability in sequence-based planners for AI agents, particularly in robotics or game-like simulations, by enabling generalization to out-of-distribution tasks and environments, though it appears incremental as it builds on existing sequence modeling techniques.
The paper tackles the challenge of multi-task behavioral planning where agents must adapt to unseen constraints and tasks, such as discovering goals and unlocking doors, by proposing GenPlan, a stochastic and adaptive planner that uses discrete-flow models for generative sequence modeling. It demonstrates effectiveness in simulation environments, outperforming state-of-the-art methods by over 10% on adaptive planning tasks.
Sequence models have demonstrated remarkable success in behavioral planning by leveraging previously collected demonstrations. However, solving multi-task missions remains a significant challenge, particularly when the planner must adapt to unseen constraints and tasks, such as discovering goals and unlocking doors. Such behavioral planning problems are challenging to solve due to: a) agents failing to adapt beyond the single task learned through their reward function, and b) inability to generalize to new environments, e.g., those with walls and locked doors, when trained only in planar environments. Consequently, state-of-the-art decision-making methods are limited to missions where the required tasks are well-represented in the training demonstrations and can be solved within a short (temporal) planning horizon. To address this, we propose GenPlan: a stochastic and adaptive planner that leverages discrete-flow models for generative sequence modeling, enabling sample-efficient exploration and exploitation. This framework relies on an iterative denoising procedure to generate a sequence of goals and actions. This approach captures multi-modal action distributions and facilitates goal and task discovery, thereby generalizing to out-of-distribution tasks and environments, i.e., missions not part of the training data. We demonstrate the effectiveness of our method through multiple simulation environments. Notably, GenPlan outperforms state-of-the-art methods by over 10% on adaptive planning tasks, where the agent adapts to multi-task missions while leveraging demonstrations from single-goal-reaching tasks. Our code is available at https://github.com/CL2-UWaterloo/GenPlan.