Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

arXiv:2601.19810v1h-index: 1

Originality Incremental advance

AI Analysis

This work addresses the problem of efficient exploration and adaptation for AI agents in broad task distributions, though it is incremental as it builds on existing meta-learning and goal-generation approaches.

The paper tackles the challenge of unsupervised pre-training for reinforcement learning agents to improve exploration and adaptation in downstream tasks, resulting in ULEE, a method that achieves improved zero-shot and few-shot performance on XLand-MiniGrid benchmarks, outperforming baselines like learning from scratch and DIAYN.

Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula.

View on arXiv PDF

Similar