LGCVJun 17, 2022

Sparse Double Descent: Where Network Pruning Aggravates Overfitting

arXiv:2206.08684v136 citationsh-index: 30
Originality Highly original
AI Analysis

This work challenges common assumptions in deep learning by showing that pruning may not always prevent overfitting, which is important for researchers and practitioners optimizing model efficiency and generalization.

The paper discovers that network pruning can sometimes worsen overfitting, revealing a sparse double descent phenomenon where test performance fluctuates with increased sparsity, and proposes a learning distance interpretation to explain this effect.

People usually believe that network pruning not only reduces the computational cost of deep networks, but also prevents overfitting by decreasing model capacity. However, our work surprisingly discovers that network pruning sometimes even aggravates overfitting. We report an unexpected sparse double descent phenomenon that, as we increase model sparsity via network pruning, test performance first gets worse (due to overfitting), then gets better (due to relieved overfitting), and gets worse at last (due to forgetting useful information). While recent studies focused on the deep double descent with respect to model overparameterization, they failed to recognize that sparsity may also cause double descent. In this paper, we have three main contributions. First, we report the novel sparse double descent phenomenon through extensive experiments. Second, for this phenomenon, we propose a novel learning distance interpretation that the curve of $\ell_{2}$ learning distance of sparse models (from initialized parameters to final parameters) may correlate with the sparse double descent curve well and reflect generalization better than minima flatness. Third, in the context of sparse double descent, a winning ticket in the lottery ticket hypothesis surprisingly may not always win.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes