DSD$^2$: Can We Dodge Sparse Double Descent and Compress the Neural Network Worry-Free?
This addresses a problem for deep learning practitioners by enabling more reliable neural network compression without performance degradation, though it is incremental in nature.
The paper tackles the sparse double descent phenomenon in deep learning, where test performance worsens, improves, and then declines with increasing sparsity, by proposing a learning framework that avoids it and improves generalization, supported by empirical evidence.
Neoteric works have shown that modern deep learning models can exhibit a sparse double descent phenomenon. Indeed, as the sparsity of the model increases, the test performance first worsens since the model is overfitting the training data; then, the overfitting reduces, leading to an improvement in performance, and finally, the model begins to forget critical information, resulting in underfitting. Such a behavior prevents using traditional early stop criteria. In this work, we have three key contributions. First, we propose a learning framework that avoids such a phenomenon and improves generalization. Second, we introduce an entropy measure providing more insights into the insurgence of this phenomenon and enabling the use of traditional stop criteria. Third, we provide a comprehensive quantitative analysis of contingent factors such as re-initialization methods, model width and depth, and dataset noise. The contributions are supported by empirical evidence in typical setups. Our code is available at https://github.com/VGCQ/DSD2.