When does loss-based prioritization fail?
This work addresses the problem of inefficient training protocols in machine learning for practitioners dealing with real-world, noisy datasets, highlighting a critical limitation of existing acceleration techniques.
The paper investigates the failure of loss-based prioritization methods in accelerating deep neural network training when applied to noisy and corrupted data, demonstrating empirically that these methods degrade performance under such conditions.
Not all examples are created equal, but standard deep neural network training protocols treat each training point uniformly. Each example is propagated forward and backward through the network the same amount of times, independent of how much the example contributes to the learning protocol. Recent work has proposed ways to accelerate training by deviating from this uniform treatment. Popular methods entail up-weighting examples that contribute more to the loss with the intuition that examples with low loss have already been learned by the model, so their marginal value to the training procedure should be lower. This view assumes that updating the model with high loss examples will be beneficial to the model. However, this may not hold for noisy, real world data. In this paper, we theorize and then empirically demonstrate that loss-based acceleration methods degrade in scenarios with noisy and corrupted data. Our work suggests measures of example difficulty need to correctly separate out noise from other types of challenging examples.