LGAIMay 25, 2022

Toward Discovering Options that Achieve Faster Planning

arXiv:2205.12515v26 citationsh-index: 74
Originality Incremental advance
AI Analysis

This work addresses computational efficiency in planning for reinforcement learning, though it is incremental as it builds on existing option discovery methods.

The paper tackles the problem of discovering options that speed up planning in reinforcement learning by proposing a new objective that reduces computational operations, and demonstrates in a four-room domain that their algorithm matches human-designed options in efficiency and intuitive behavior.

We propose a new objective for option discovery that emphasizes the computational advantage of using options in planning. In a sequential machine, the speed of planning is proportional to the number of elementary operations used to achieve a good policy. For episodic tasks, the number of elementary operations depends on the number of options composed by the policy in an episode and the number of options being considered at each decision point. To reduce the amount of computation in planning, for a given set of episodic tasks and a given number of options, our objective prefers options with which it is possible to achieve a high return by composing few options, and also prefers a smaller set of options to choose from at each decision point. We develop an algorithm that optimizes the proposed objective. In a variant of the classic four-room domain, we show that 1) a higher objective value is typically associated with fewer number of elementary planning operations used by the option-value iteration algorithm to obtain a near-optimal value function, 2) our algorithm achieves an objective value that matches it achieved by two human-designed options 3) the amount of computation used by option-value iteration with options discovered by our algorithm matches it with the human-designed options, 4) the options produced by our algorithm also make intuitive sense--they seem to move to and terminate at the entrances of rooms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes