Learning Sparse Filters in Deep Convolutional Neural Networks with a l1/l2 Pseudo-Norm
This addresses the problem of deploying deep neural networks on resource-limited devices by providing a simpler, more efficient compression method, though it appears incremental as it builds on existing sparsity techniques.
The paper tackles the problem of high memory and computation costs in deep neural networks by introducing a sparsity-inducing regularization term based on an l1/l2 pseudo-norm to reduce filter kernels, resulting in very compact models without iterative retraining. Experimental results on MNIST and CIFAR-10 show significant filter reduction in models like LeNet and VGG while maintaining or improving accuracy, outperforming other methods like l1, l2, SSL, NISP, and GAL in sparsity-accuracy trade-offs.
While deep neural networks (DNNs) have proven to be efficient for numerous tasks, they come at a high memory and computation cost, thus making them impractical on resource-limited devices. However, these networks are known to contain a large number of parameters. Recent research has shown that their structure can be more compact without compromising their performance. In this paper, we present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients. By defining this pseudo-norm appropriately for the different filter kernels, and removing irrelevant filters, the number of kernels in each layer can be drastically reduced leading to very compact Deep Convolutional Neural Networks (DCNN) structures. Unlike numerous existing methods, our approach does not require an iterative retraining process and, using this regularization term, directly produces a sparse model during the training process. Furthermore, our approach is also much easier and simpler to implement than existing methods. Experimental results on MNIST and CIFAR-10 show that our approach significantly reduces the number of filters of classical models such as LeNet and VGG while reaching the same or even better accuracy than the baseline models. Moreover, the trade-off between the sparsity and the accuracy is compared to other loss regularization terms based on the l1 or l2 norm as well as the SSL, NISP and GAL methods and shows that our approach is outperforming them.