AIAug 1, 2017

Hierarchical Subtask Discovery With Non-Negative Matrix Factorization

arXiv:1708.00463v12 citations
Originality Incremental advance
AI Analysis

This addresses the problem of subtask discovery for hierarchical reinforcement learning in complex domains, offering a novel method that is incremental in improving decomposition flexibility.

The paper tackles the challenge of learning hierarchical decompositions into subtasks in reinforcement learning by presenting a novel algorithm based on non-negative matrix factorization within the MLMDP framework, showing that it learns intuitive decompositions across various domains with features like distributed patterns and task-dependent hierarchies.

Hierarchical reinforcement learning methods offer a powerful means of planning flexible behavior in complicated domains. However, learning an appropriate hierarchical decomposition of a domain into subtasks remains a substantial challenge. We present a novel algorithm for subtask discovery, based on the recently introduced multitask linearly-solvable Markov decision process (MLMDP) framework. The MLMDP can perform never-before-seen tasks by representing them as a linear combination of a previously learned basis set of tasks. In this setting, the subtask discovery problem can naturally be posed as finding an optimal low-rank approximation of the set of tasks the agent will face in a domain. We use non-negative matrix factorization to discover this minimal basis set of tasks, and show that the technique learns intuitive decompositions in a variety of domains. Our method has several qualitatively desirable features: it is not limited to learning subtasks with single goal states, instead learning distributed patterns of preferred states; it learns qualitatively different hierarchical decompositions in the same domain depending on the ensemble of tasks the agent will face; and it may be straightforwardly iterated to obtain deeper hierarchical decompositions.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes