Towards Better Generalization via Distributional Input Projection Network
This addresses the problem of limited generalization insights from training loss for researchers and practitioners in deep learning, offering a general method to boost performance, though it appears incremental as it builds on existing smoothness concepts.
The paper tackles the challenge of improving generalization in overparameterized models by introducing Distributional Input Projection Networks (DIPNet), which projects inputs into learnable distributions to smooth the loss landscape, resulting in enhanced test performance across various architectures and tasks.
As overparameterized models become increasingly prevalent, training loss alone offers limited insight into generalization performance. While smoothness has been linked to improved generalization across various settings, directly enforcing smoothness in neural networks remains challenging. To address this, we introduce Distributional Input Projection Networks (DIPNet), a novel framework that projects inputs into learnable distributions at each layer. This distributional representation induces a smoother loss landscape with respect to the input, promoting better generalization. We provide theoretical analysis showing that DIPNet reduces both local smoothness measures and the Lipschitz constant of the network, contributing to improved generalization performance. Empirically, we validate DIPNet across a wide range of architectures and tasks, including Vision Transformers (ViTs), Large Language Models (LLMs), ResNet and MLPs. Our method consistently enhances test performance under standard settings, adversarial attacks, out-of-distribution inputs, and reasoning benchmarks. We demonstrate that the proposed input projection strategy can be seamlessly integrated into existing models, providing a general and effective approach for boosting generalization performance in modern deep learning.