LG CVNov 10, 2024

Mitigating covariate shift in non-colocated data with learned parameter priors

Behraj Khan, Behroz Mirza, Nouman Durrani, Tahir Syed

arXiv:2411.06499v12.6h-index: 8

Originality Incremental advance

AI Analysis

This addresses model reliability issues for practitioners using non-colocated data, but it is incremental as it builds on importance-weighting methods.

The paper tackles the problem of covariate shift in distributed training data, which biases cross-validation and model selection, by introducing FIcsR, a method that minimizes f-divergence and incorporates learned parameter priors, achieving accuracy improvements of over 5% against batch and 10% against fold state-of-the-art.

When training data are distributed across{ time or space,} covariate shift across fragments of training data biases cross-validation, compromising model selection and assessment. We present \textit{Fragmentation-Induced covariate-shift Remediation} ($FIcsR$), which minimizes an $f$-divergence between a fragment's covariate distribution and that of the standard cross-validation baseline. We s{how} an equivalence with popular importance-weighting methods. {The method}'s numerical solution poses a computational challenge owing to the overparametrized nature of a neural network, and we derive a Fisher Information approximation. When accumulated over fragments, this provides a global estimate of the amount of shift remediation thus far needed, and we incorporate that as a prior via the minimization objective. In the paper, we run extensive classification experiments on multiple data classes, over $40$ datasets, and with data batched over multiple sequence lengths. We extend the study to the $k$-fold cross-validation setting through a similar set of experiments. An ablation study exposes the method to varying amounts of shift and demonstrates slower degradation with $FIcsR$ in place. The results are promising under all these conditions; with improved accuracy against batch and fold state-of-the-art by more than $5\%$ and $10\%$, respectively.

View on arXiv PDF

Similar