Mitigating covariate shift in non-colocated data with learned parameter priors
This addresses model reliability issues for practitioners using non-colocated data, but it is incremental as it builds on importance-weighting methods.
The paper tackles the problem of covariate shift in distributed training data, which biases cross-validation and model selection, by introducing FIcsR, a method that minimizes f-divergence and incorporates learned parameter priors, achieving accuracy improvements of over 5% against batch and 10% against fold state-of-the-art.
When training data are distributed across{ time or space,} covariate shift across fragments of training data biases cross-validation, compromising model selection and assessment. We present \textit{Fragmentation-Induced covariate-shift Remediation} ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline. We s{how} an equivalence with popular importance-weighting methods. {The method}'s numerical solution poses a computational challenge owing to the overparametrized nature of a neural network, and we derive a Fisher Information approximation. When accumulated over fragments, this provides a global estimate of the amount of shift remediation thus far needed, and we incorporate that as a prior via the minimization objective. In the paper, we run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths. We extend the study to the $k$-fold cross-validation setting through a similar set of experiments. An ablation study exposes the method to varying amounts of shift and demonstrates slower degradation with $FIcsR$ in place. The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5\%$ and $10\%$, respectively.