PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
This work addresses the challenge of explaining why deep learning generalizes, which is crucial for researchers and practitioners in machine learning, though it appears incremental in improving existing bounds.
The paper tackled the problem of uninformative generalization bounds for deep neural networks by developing a compression approach based on quantizing parameters in a linear subspace, achieving state-of-the-art generalization bounds on various tasks, including transfer learning, and finding that large models can be compressed more than previously known.
While there has been progress in developing non-vacuous generalization bounds for deep neural networks, these bounds tend to be uninformative about why deep learning works. In this paper, we develop a compression approach based on quantizing neural network parameters in a linear subspace, profoundly improving on previous results to provide state-of-the-art generalization bounds on a variety of tasks, including transfer learning. We use these tight bounds to better understand the role of model size, equivariance, and the implicit biases of optimization, for generalization in deep learning. Notably, we find large models can be compressed to a much greater extent than previously known, encapsulating Occam's razor. We also argue for data-independent bounds in explaining generalization.