LG MLOct 8, 2019

Differentiable Sparsification for Deep Neural Networks

arXiv:1910.03201v68 citations

Originality Incremental advance

AI Analysis

This addresses the challenge of resource-intensive network size reduction for deep learning practitioners, though it appears incremental as it builds on existing sparsification techniques.

The paper tackles the problem of determining effective architectures and reducing the size of deep neural networks by proposing a fully differentiable sparsification method that zeros out unimportant parameters through direct optimization of a regularized objective with stochastic gradient descent, enabling end-to-end learning of sparsified structures and weights.

Deep neural networks have significantly alleviated the burden of feature engineering, but comparable efforts are now required to determine effective architectures for these networks. Furthermore, as network sizes have become excessively large, a substantial amount of resources is invested in reducing their sizes. These challenges can be effectively addressed through the sparsification of over-complete models. In this study, we propose a fully differentiable sparsification method for deep neural networks, which can zero out unimportant parameters by directly optimizing a regularized objective function with stochastic gradient descent. Consequently, the proposed method can learn both the sparsified structure and weights of a network in an end-to-end manner. It can be directly applied to various modern deep neural networks and requires minimal modification to the training process. To the best of our knowledge, this is the first fully differentiable sparsification method.

View on arXiv PDF

Similar