A Method for Enhancing Generalization of Adam by Multiple Integrations
This addresses a known bottleneck in adaptive optimization methods for machine learning practitioners, offering an incremental improvement.
The paper tackles Adam optimizer's insufficient generalization by proposing MIAdam, which integrates a multiple integral term to filter out sharp minima and guide optimization toward flatter regions. Experimental results show MIAdam enhances generalization and robustness against label noise while maintaining rapid convergence, outperforming Adam and its variants on benchmarks.
The insufficient generalization of adaptive moment estimation (Adam) has hindered its broader application. Recent studies have shown that flat minima in loss landscapes are highly associated with improved generalization. Inspired by the filtering effect of integration operations on high-frequency signals, we propose multiple integral Adam (MIAdam), a novel optimizer that integrates a multiple integral term into Adam. This multiple integral term effectively filters out sharp minima encountered during optimization, guiding the optimizer towards flatter regions and thereby enhancing generalization capability. We provide a theoretical explanation for the improvement in generalization through the diffusion theory framework and analyze the impact of the multiple integral term on the optimizer's convergence. Experimental results demonstrate that MIAdam not only enhances generalization and robustness against label noise but also maintains the rapid convergence characteristic of Adam, outperforming Adam and its variants in state-of-the-art benchmarks.