Batch Inverse-Variance Weighting: Deep Heteroscedastic Regression
This addresses the issue of label noise in machine learning for applications with variable labeling accuracy, but it is incremental as it adapts an existing statistical method to neural networks.
The paper tackles the problem of heteroscedastic regression in supervised learning, where label noise varies per sample, by proposing Batch Inverse-Variance (BIV), a loss function based on inverse-variance weighting adapted for neural networks; experimental results show that BIV significantly improves performance on two noisy datasets compared to baseline methods like L2 loss and inverse-variance weighting.
Heteroscedastic regression is the task of supervised learning where each label is subject to noise from a different distribution. This noise can be caused by the labelling process, and impacts negatively the performance of the learning algorithm as it violates the i.i.d. assumptions. In many situations however, the labelling process is able to estimate the variance of such distribution for each label, which can be used as an additional information to mitigate this impact. We adapt an inverse-variance weighted mean square error, based on the Gauss-Markov theorem, for parameter optimization on neural networks. We introduce Batch Inverse-Variance, a loss function which is robust to near-ground truth samples, and allows to control the effective learning rate. Our experimental results show that BIV improves significantly the performance of the networks on two noisy datasets, compared to L2 loss, inverse-variance weighting, as well as a filtering-based baseline.