LGSTOct 21, 2025

Provable Generalization Bounds for Deep Neural Networks with Momentum-Adaptive Gradient Dropout

arXiv:2510.18410v2
Originality Incremental advance
AI Analysis

This work addresses generalization issues in deep learning for high-stakes applications, offering a novel method with theoretical justification, though it appears incremental as it builds on existing dropout and momentum techniques.

The paper tackles overfitting in deep neural networks by introducing Momentum-Adaptive Gradient Dropout (MAGDrop), a regularization method that dynamically adjusts dropout rates based on gradients and momentum, achieving up to 29.2% tighter generalization bounds and competitive performance on MNIST (99.52%) and CIFAR-10 (92.03%).

Deep neural networks (DNNs) achieve remarkable performance but often suffer from overfitting due to their high capacity. We introduce Momentum-Adaptive Gradient Dropout (MAGDrop), a novel regularization method that dynamically adjusts dropout rates on activations based on current gradients and accumulated momentum, enhancing stability in non-convex optimization landscapes. To theoretically justify MAGDrop's effectiveness, we derive a non-asymptotic, computable PAC-Bayes generalization bound that accounts for its adaptive nature, achieving up to 29.2\% tighter bounds compared to standard approaches by leveraging momentum-driven perturbation control. Empirically, the activation-based MAGDrop achieves competitive performance on MNIST (99.52\%) and CIFAR-10 (92.03\%), with generalization gaps of 0.48\% and 6.52\%, respectively. We provide fully reproducible code and numerical computation of our bounds to validate our theoretical claims. Our work bridges theoretical insights and practical advancements, offering a robust framework for enhancing DNN generalization, making it suitable for high-stakes applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes