CV LGJul 15, 2021

Compact and Optimal Deep Learning with Recurrent Parameter Generators

Jiayun Wang, Yubei Chen, Stella X. Yu, Brian Cheung, Yann LeCun

arXiv:2107.07110v36.56 citationsHas Code

Originality Highly original

AI Analysis

This work addresses the need for compact and efficient deep learning models for deployment in resource-constrained environments, offering a novel approach that is not incremental but introduces a new paradigm.

The paper tackles the problem of model compression in deep learning by proposing a recurrent parameter generator (RPG) that decouples degrees of freedom from the number of parameters, enabling one-stage end-to-end learning with random constraints. It achieves 96% of ResNet18's ImageNet accuracy with only 18% degrees of freedom and 52% of ResNet34's accuracy with 0.25% degrees of freedom.

Deep learning has achieved tremendous success by training increasingly large models, which are then compressed for practical deployment. We propose a drastically different approach to compact and optimal deep learning: We decouple the Degrees of freedom (DoF) and the actual number of parameters of a model, optimize a small DoF with predefined random linear constraints for a large model of arbitrary architecture, in one-stage end-to-end learning. Specifically, we create a recurrent parameter generator (RPG), which repeatedly fetches parameters from a ring and unpacks them onto a large model with random permutation and sign flipping to promote parameter decorrelation. We show that gradient descent can automatically find the best model under constraints with faster convergence. Our extensive experimentation reveals a log-linear relationship between model DoF and accuracy. Our RPG demonstrates remarkable DoF reduction and can be further pruned and quantized for additional run-time performance gain. For example, in terms of top-1 accuracy on ImageNet, RPG achieves $96\%$ of ResNet18's performance with only $18\%$ DoF (the equivalent of one convolutional layer) and $52\%$ of ResNet34's performance with only $0.25\%$ DoF! Our work shows a significant potential of constrained neural optimization in compact and optimal deep learning.

View on arXiv PDF Code

Similar