LGAIDec 8, 2023

Backward Learning for Goal-Conditioned Policies

arXiv:2312.05044v23 citationsh-index: 5
AI Analysis

This addresses the challenge of reward-free policy learning for goal-conditioned tasks, though it appears incremental as it builds on existing imitation learning and world modeling techniques.

The paper tackles the problem of learning policies in reinforcement learning without rewards by proposing a multi-step procedure that uses a backward world model, generates goal-reaching trajectories, and trains a policy via imitation learning, showing consistent goal-reaching in a deterministic maze environment with 64x64 pixel images.

Can we learn policies in reinforcement learning without rewards? Can we learn a policy just by trying to reach a goal state? We answer these questions positively by proposing a multi-step procedure that first learns a world model that goes backward in time, secondly generates goal-reaching backward trajectories, thirdly improves those sequences using shortest path finding algorithms, and finally trains a neural network policy by imitation learning. We evaluate our method on a deterministic maze environment where the observations are $64\times 64$ pixel bird's eye images and can show that it consistently reaches several goals.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes