LGAIAug 4, 2023

Frustratingly Easy Model Generalization by Dummy Risk Minimization

arXiv:2308.02287v23 citationsh-index: 20
Originality Incremental advance
AI Analysis

This addresses a fundamental problem in machine learning by enhancing model generalization in an incremental yet broadly applicable way.

The paper tackles the limited generalization ability of empirical risk minimization (ERM) by proposing Dummy Risk Minimization (DuRM), a simple technique that enlarges output logits dimensions and improves performance across diverse tasks, such as classification and segmentation, with consistent gains.

Empirical risk minimization (ERM) is a fundamental machine learning paradigm. However, its generalization ability is limited in various tasks. In this paper, we devise Dummy Risk Minimization (DuRM), a frustratingly easy and general technique to improve the generalization of ERM. DuRM is extremely simple to implement: just enlarging the dimension of the output logits and then optimizing using standard gradient descent. Moreover, we validate the efficacy of DuRM on both theoretical and empirical analysis. Theoretically, we show that DuRM derives greater variance of the gradient, which facilitates model generalization by observing better flat local minima. Empirically, we conduct evaluations of DuRM across different datasets, modalities, and network architectures on diverse tasks, including conventional classification, semantic segmentation, out-of-distribution generalization, adverserial training, and long-tailed recognition. Results demonstrate that DuRM could consistently improve the performance under all tasks with an almost free lunch manner. Furthermore, we show that DuRM is compatible with existing generalization techniques and we discuss possible limitations. We hope that DuRM could trigger new interest in the fundamental research on risk minimization.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes