LGFeb 7, 2021

An Analysis of Frame-skipping in Reinforcement Learning

Shivaram Kalyanakrishnan, Siddharth Aravindan, Vishwajeet Bagdawat, Varun Bhatt, Harshith Goka, Archit Gupta, Kalpesh Krishna, Vihari Piratla

arXiv:2102.03718v114.128 citations

Originality Incremental advance

AI Analysis

This work is significant for practitioners and researchers in reinforcement learning, providing a theoretical analysis and empirical support for the common practice of frame-skipping, which can improve policy quality and reduce computational costs.

This paper investigates the role of frame-skipping (parameter $d$) in reinforcement learning, where agents sense state at intervals of $d$ time steps. The authors observe that frame-skipping does not affect asymptotic consistency for policy evaluation and can benefit learning, especially when using action-repetition. They define the "price of inertia" to bound the loss from action-repetition, showing it can be offset by gains from a smaller task horizon.

In the practice of sequential decision making, agents are often designed to sense state at regular intervals of $d$ time steps, $d > 1$, ignoring state information in between sensing steps. While it is clear that this practice can reduce sensing and compute costs, recent results indicate a further benefit. On many Atari console games, reinforcement learning (RL) algorithms deliver substantially better policies when run with $d > 1$ -- in fact with $d$ even as high as $180$. In this paper, we investigate the role of the parameter $d$ in RL; $d$ is called the "frame-skip" parameter, since states in the Atari domain are images. For evaluating a fixed policy, we observe that under standard conditions, frame-skipping does not affect asymptotic consistency. Depending on other parameters, it can possibly even benefit learning. To use $d > 1$ in the control setting, one must first specify which $d$-step open-loop action sequences can be executed in between sensing steps. We focus on "action-repetition", the common restriction of this choice to $d$-length sequences of the same action. We define a task-dependent quantity called the "price of inertia", in terms of which we upper-bound the loss incurred by action-repetition. We show that this loss may be offset by the gain brought to learning by a smaller task horizon. Our analysis is supported by experiments on different tasks and learning algorithms.

View on arXiv PDF

Similar