LGMar 28, 2021

Understanding the role of importance weighting for deep learning

arXiv:2103.15209v153 citations
Originality Synthesis-oriented
AI Analysis

This work addresses a theoretical gap in deep learning for researchers, offering incremental insights into importance weighting effects.

The paper tackles the unclear impact of importance weighting in over-parameterized deep learning models by providing formal theoretical characterizations of its role in optimization dynamics and generalization performance, explaining observed phenomena and extending to weight-optimization studies.

The recent paper by Byrd & Lipton (2019), based on empirical observations, raises a major concern on the impact of importance weighting for the over-parameterized deep learning models. They observe that as long as the model can separate the training data, the impact of importance weighting diminishes as the training proceeds. Nevertheless, there lacks a rigorous characterization of this phenomenon. In this paper, we provide formal characterizations and theoretical justifications on the role of importance weighting with respect to the implicit bias of gradient descent and margin-based learning theory. We reveal both the optimization dynamics and generalization performance under deep learning models. Our work not only explains the various novel phenomenons observed for importance weighting in deep learning, but also extends to the studies where the weights are being optimized as part of the model, which applies to a number of topics under active research.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes