LGFeb 7, 2021

An Analysis of Frame-skipping in Reinforcement Learning

arXiv:2102.03718v128 citations
Originality Incremental advance
AI Analysis

This work is significant for practitioners and researchers in reinforcement learning, providing a theoretical analysis and empirical support for the common practice of frame-skipping, which can improve policy quality and reduce computational costs.

This paper investigates the role of frame-skipping (parameter $d$) in reinforcement learning, where agents sense state at intervals of $d$ time steps. The authors observe that frame-skipping does not affect asymptotic consistency for policy evaluation and can benefit learning, especially when using action-repetition. They define the "price of inertia" to bound the loss from action-repetition, showing it can be offset by gains from a smaller task horizon.

In the practice of sequential decision making, agents are often designed to sense state at regular intervals of $d$ time steps, $d > 1$, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially better policies when run with $d > 1$ -- in fact with $d$ even as high as $180$. In this paper, we investigate the role of the parameter $d$ in RL; $d$ is called the "frame-skip" parameter, since states in the Atari domain are images. For evaluating a fixed policy, we observe that under standard conditions, frame-skipping does not affect asymptotic consistency. Depending on other parameters, it can possibly even benefit learning. To use $d > 1$ in the control setting, one must first specify which $d$-step open-loop action sequences can be executed in between sensing steps. We focus on "action-repetition", the common restriction of this choice to $d$-length sequences of the same action. We define a task-dependent quantity called the "price of inertia", in terms of which we upper-bound the loss incurred by action-repetition. We show that this loss may be offset by the gain brought to learning by a smaller task horizon. Our analysis is supported by experiments on different tasks and learning algorithms.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes