Bayesian Sparsification of Gated Recurrent Neural Networks
This work addresses efficiency and interpretability challenges in recurrent neural networks for machine learning practitioners, but it is incremental as it extends existing Bayesian sparsification techniques to gated architectures.
The paper tackles the problem of sparsifying gated recurrent neural networks, such as LSTM, by applying Bayesian methods to sparsify weights, neurons, and preactivations of gates, resulting in faster forward passes and improved compression, with interpretable gate sparsity patterns that vary by task.
Bayesian methods have been successfully applied to sparsify weights of neural networks and to remove structure units from the networks, e. g. neurons. We apply and further develop this approach for gated recurrent architectures. Specifically, in addition to sparsification of individual weights and neurons, we propose to sparsify preactivations of gates and information flow in LSTM. It makes some gates and information flow components constant, speeds up forward pass and improves compression. Moreover, the resulting structure of gate sparsity is interpretable and depends on the task. Code is available on github: https://github.com/tipt0p/SparseBayesianRNN