Uniform Learning in a Deep Neural Network via "Oddball" Stochastic Gradient Descent
This addresses the issue of uneven learning in deep neural networks for researchers and practitioners, but it appears incremental as it builds on a recently introduced method.
The authors tackled the problem of non-uniform training error distribution in deep neural networks by using 'Oddball SGD' to enforce uniform error across the training set, resulting in a method that adjusts training frequency based on error magnitude.
When training deep neural networks, it is typically assumed that the training examples are uniformly difficult to learn. Or, to restate, it is assumed that the training error will be uniformly distributed across the training examples. Based on these assumptions, each training example is used an equal number of times. However, this assumption may not be valid in many cases. "Oddball SGD" (novelty-driven stochastic gradient descent) was recently introduced to drive training probabilistically according to the error distribution - training frequency is proportional to training error magnitude. In this article, using a deep neural network to encode a video, we show that oddball SGD can be used to enforce uniform error across the training set.