Compression Implies Generalization
This work addresses a theoretical gap in understanding generalization for deep learning, offering a framework that applies broadly across machine learning models, though it is incremental in extending existing compression ideas.
The paper tackles the problem of proving generalization bounds for deep neural networks by establishing a compression-based framework that extends previous bounds to hold for the original uncompressed networks, and demonstrates its flexibility by providing simple proofs for strong generalization bounds in Support Vector Machines and Boosting.
Explaining the surprising generalization performance of deep neural networks is an active and important line of research in theoretical machine learning. Influential work by Arora et al. (ICML'18) showed that, noise stability properties of deep nets occurring in practice can be used to provably compress model representations. They then argued that the small representations of compressed networks imply good generalization performance albeit only of the compressed nets. Extending their compression framework to yield generalization bounds for the original uncompressed networks remains elusive. Our main contribution is the establishment of a compression-based framework for proving generalization bounds. The framework is simple and powerful enough to extend the generalization bounds by Arora et al. to also hold for the original network. To demonstrate the flexibility of the framework, we also show that it allows us to give simple proofs of the strongest known generalization bounds for other popular machine learning models, namely Support Vector Machines and Boosting.