LGMLJun 8, 2020

Rethinking Importance Weighting for Deep Learning under Distribution Shift

arXiv:2006.04662v2170 citations
AI Analysis

This addresses distribution shift in deep learning for practitioners, but it is incremental as it builds on existing importance weighting techniques.

The paper tackles the problem of importance weighting under distribution shift in deep learning by identifying a circular dependency between weight estimation and weighted classification, and proposes dynamic IW as an end-to-end solution that iterates between these steps. Experiments on three datasets show it compares favorably with state-of-the-art methods.

Under distribution shift (DS) where the training data distribution differs from the test one, a powerful technique is importance weighting (IW) which handles DS in two separate steps: weight estimation (WE) estimates the test-over-training density ratio and weighted classification (WC) trains the classifier from weighted training data. However, IW cannot work well on complex data, since WE is incompatible with deep learning. In this paper, we rethink IW and theoretically show it suffers from a circular dependency: we need not only WE for WC, but also WC for WE where a trained deep classifier is used as the feature extractor (FE). To cut off the dependency, we try to pretrain FE from unweighted training data, which leads to biased FE. To overcome the bias, we propose an end-to-end solution dynamic IW that iterates between WE and WC and combines them in a seamless manner, and hence our WE can also enjoy deep networks and stochastic optimizers indirectly. Experiments with two representative types of DS on three popular datasets show that our dynamic IW compares favorably with state-of-the-art methods.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes