LGNov 14, 2024

Complexity-Aware Training of Deep Neural Networks for Optimal Structure Discovery

Valentin Frank Ingmar Guenter, Athanasios Sideris

arXiv:2411.09127v21 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses the challenge of efficiently compressing neural networks for deployment, though it is incremental as it builds on prior pruning techniques.

The authors tackled the problem of pruning deep neural networks during training without needing a pre-trained model, achieving improved pruning ratios and test accuracy on CIFAR-10/100 and ImageNet datasets compared to existing methods.

We propose a novel algorithm for combined unit and layer pruning of deep neural networks that functions during training and without requiring a pre-trained network to apply. Our algorithm optimally trades-off learning accuracy and pruning levels while balancing layer vs. unit pruning and computational vs. parameter complexity using only three user-defined parameters, which are easy to interpret and tune. We formulate a stochastic optimization problem over the network weights and the parameters of variational Bernoulli distributions for binary Random Variables taking values either 0 or 1 and scaling the units and layers of the network. Optimal network structures are found as the solution to this optimization problem. Pruning occurs when a variational parameter converges to 0 rendering the corresponding structure permanently inactive, thus saving computations both during training and prediction. A key contribution of our approach is to define a cost function that combines the objectives of prediction accuracy and network pruning in a computational/parameter complexity-aware manner and the automatic selection of the many regularization parameters. We show that the proposed algorithm converges to solutions of the optimization problem corresponding to deterministic networks. We analyze the ODE system that underlies our stochastic optimization algorithm and establish domains of attraction for the dynamics of the network parameters. These theoretical results lead to practical pruning conditions avoiding the premature pruning of units and layers during training. We evaluate our method on the CIFAR-10/100 and ImageNet datasets using ResNet architectures and demonstrate that it gives improved results with respect to pruning ratios and test accuracy over layer-only or unit-only pruning and favorably competes with combined unit and layer pruning algorithms requiring pre-trained networks.

View on arXiv PDF

Similar