Flatness-Aware Minimization for Domain Generalization
This work addresses domain generalization for machine learning models, offering an incremental improvement by introducing a new optimizer based on flatness-aware principles.
The paper tackles the problem of optimizer selection in domain generalization by proposing Flatness-Aware Minimization (FAD), which optimizes for flatness in loss landscapes and shows superior performance on various datasets compared to default methods like Adam.
Domain generalization (DG) seeks to learn robust models that generalize well under unknown distribution shifts. As a critical aspect of DG, optimizer selection has not been explored in depth. Currently, most DG methods follow the widely used benchmark, DomainBed, and utilize Adam as the default optimizer for all datasets. However, we reveal that Adam is not necessarily the optimal choice for the majority of current DG methods and datasets. Based on the perspective of loss landscape flatness, we propose a novel approach, Flatness-Aware Minimization for Domain Generalization (FAD), which can efficiently optimize both zeroth-order and first-order flatness simultaneously for DG. We provide theoretical analyses of the FAD's out-of-distribution (OOD) generalization error and convergence. Our experimental results demonstrate the superiority of FAD on various DG datasets. Additionally, we confirm that FAD is capable of discovering flatter optima in comparison to other zeroth-order and first-order flatness-aware optimization methods.