LGDec 31, 2021

Single-Shot Pruning for Offline Reinforcement Learning

Samin Yeasar Arnob, Riyasat Ohib, Sergey Plis, Doina Precup

arXiv:2112.15579v114.627 citations

Originality Incremental advance

AI Analysis

This addresses the issue of computational inefficiency for researchers and practitioners in offline RL, though it is incremental as it adapts existing pruning methods to a new context.

The paper tackles the problem of large neural networks in offline reinforcement learning by applying single-shot pruning before training, showing that with 95% of weights pruned, performance is retained in most experiments.

Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems. Large neural networks employed in the framework are traditionally associated with better generalization capabilities, but their increased size entails the drawbacks of extensive training duration, substantial hardware resources, and longer inference times. One way to tackle this problem is to prune neural networks leaving only the necessary parameters. State-of-the-art concurrent pruning techniques for imposing sparsity perform demonstrably well in applications where data distributions are fixed. However, they have not yet been substantially explored in the context of RL. We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL. We leverage a fixed dataset to prune neural networks before the start of RL training. We then run experiments varying the network sparsity level and evaluating the validity of pruning at initialization techniques in continuous control tasks. Our results show that with 95% of the network weights pruned, Offline-RL algorithms can still retain performance in the majority of our experiments. To the best of our knowledge, no prior work utilizing pruning in RL retained performance at such high levels of sparsity. Moreover, pruning at initialization techniques can be easily integrated into any existing Offline-RL algorithms without changing the learning objective.

View on arXiv PDF

Similar