LGAIJun 10, 2015

The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning

arXiv:1506.03379v26 citations
AI Analysis

This work addresses the problem of efficient cross-task exploration for lifelong reinforcement learning agents, offering a theoretical foundation with practical applications in domains like human-robot interaction, though it is incremental as it builds on existing lifelong RL frameworks.

The paper tackles the challenge of transferring knowledge across related tasks in lifelong reinforcement learning by formulating a novel online coupon-collector problem and providing an optimal algorithm, resulting in a new lifelong RL algorithm with significantly reduced sample complexity compared to single-task learning, even under adversarial task sequences.

Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL). Despite much encouraging empirical evidence, there has been little theoretical analysis. In this paper, we study a class of lifelong RL problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action sets and different transition/reward functions. Motivated by the need for cross-task exploration in lifelong learning, we formulate a novel online coupon-collector problem and give an optimal algorithm. This allows us to develop a new lifelong RL algorithm, whose overall sample complexity in a sequence of tasks is much smaller than single-task learning, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in simulated problems, including a recently introduced human-robot interaction problem.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes