AIFeb 20, 2025

Making Universal Policies Universal

arXiv:2502.14777v1h-index: 3Has CodeAAMAS
Originality Incremental advance
AI Analysis

This addresses the problem of cross-agent learning for researchers in reinforcement learning, though it appears incremental as it builds on existing universal policy frameworks.

The paper tackled the challenge of developing a generalist agent for sequential decision-making tasks by proposing a universal policy framework that trains a planner on a joint dataset from multiple agents, achieving up to 42.20% improvement in task completion accuracy compared to single-agent training.

The development of a generalist agent capable of solving a wide range of sequential decision-making tasks remains a significant challenge. We address this problem in a cross-agent setup where agents share the same observation space but differ in their action spaces. Our approach builds on the universal policy framework, which decouples policy learning into two stages: a diffusion-based planner that generates observation sequences and an inverse dynamics model that assigns actions to these plans. We propose a method for training the planner on a joint dataset composed of trajectories from all agents. This method offers the benefit of positive transfer by pooling data from different agents, while the primary challenge lies in adapting shared plans to each agent's unique constraints. We evaluate our approach on the BabyAI environment, covering tasks of varying complexity, and demonstrate positive transfer across agents. Additionally, we examine the planner's generalisation ability to unseen agents and compare our method to traditional imitation learning approaches. By training on a pooled dataset from multiple agents, our universal policy achieves an improvement of up to $42.20\%$ in task completion accuracy compared to a policy trained on a dataset from a single agent.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes