AILGNEROJul 17, 2017

Reverse Curriculum Generation for Reinforcement Learning

arXiv:1707.05300v3521 citations
Originality Highly original
AI Analysis

This addresses the problem of efficient training for robots in complex tasks without needing expert demonstrations or manual reward shaping, though it is incremental in automating curriculum design.

The paper tackles the challenge of sparse rewards in goal-oriented reinforcement learning tasks by proposing a reverse curriculum generation method that trains agents from the goal backward, automatically adapting start states to agent performance, and demonstrates its effectiveness on navigation and manipulation problems unsolvable by prior methods.

Many relevant tasks require an agent to reach a certain state, or to manipulate objects into a desired configuration. For example, we might want a robot to align and assemble a gear onto an axle or insert and turn a key in a lock. These goal-oriented tasks present a considerable challenge for reinforcement learning, since their natural reward function is sparse and prohibitive amounts of exploration are required to reach the goal and receive some learning signal. Past approaches tackle these problems by exploiting expert demonstrations or by manually designing a task-specific reward shaping function to guide the learning agent. Instead, we propose a method to learn these tasks without requiring any prior knowledge other than obtaining a single state in which the task is achieved. The robot is trained in reverse, gradually learning to reach the goal from a set of start states increasingly far from the goal. Our method automatically generates a curriculum of start states that adapts to the agent's performance, leading to efficient training on goal-oriented tasks. We demonstrate our approach on difficult simulated navigation and fine-grained manipulation problems, not solvable by state-of-the-art reinforcement learning methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes