AIFeb 10

Efficient Unsupervised Environment Design through Hierarchical Policy Representation Learning

arXiv:2602.09813v1h-index: 5
Originality Incremental advance
AI Analysis

This addresses the problem of inefficient curriculum generation for AI agents in settings with limited training opportunities, representing an incremental improvement over existing UED methods.

The paper tackles the challenge of Unsupervised Environment Design (UED) in resource-constrained scenarios by introducing a hierarchical MDP framework that uses student policy representations and a generative model to reduce teacher-student interactions, showing it outperforms baselines with fewer interactions in experiments.

Unsupervised Environment Design (UED) has emerged as a promising approach to developing general-purpose agents through automated curriculum generation. Popular UED methods focus on Open-Endedness, where teacher algorithms rely on stochastic processes for infinite generation of useful environments. This assumption becomes impractical in resource-constrained scenarios where teacher-student interaction opportunities are limited. To address this challenge, we introduce a hierarchical Markov Decision Process (MDP) framework for environment design. Our framework features a teacher agent that leverages student policy representations derived from discovered evaluation environments, enabling it to generate training environments based on the student's capabilities. To improve efficiency, we incorporate a generative model that augments the teacher's training dataset with synthetic data, reducing the need for teacher-student interactions. In experiments across several domains, we show that our method outperforms baseline approaches while requiring fewer teacher-student interactions in a single episode. The results suggest the applicability of our approach in settings where training opportunities are limited.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes