LGOct 11, 2021

Disturbing Target Values for Neural Network Regularization

arXiv:2110.05003v1
Originality Incremental advance
AI Analysis

This work addresses overfitting in neural networks for practitioners, offering incremental improvements over existing regularization methods.

The authors tackled the overfitting problem in neural networks by proposing Directional DisturbLabel (DDL) and other variants that selectively disturb confident labels based on class probabilities, rather than randomly as in DisturbLabel, and demonstrated that these methods outperform or match existing regularization techniques like DisturbLabel, L2, and Dropout on 6 classification and 8 regression datasets.

Diverse regularization techniques have been developed such as L2 regularization, Dropout, DisturbLabel (DL) to prevent overfitting. DL, a newcomer on the scene, regularizes the loss layer by flipping a small share of the target labels at random and training the neural network on this distorted data so as to not learn the training data. It is observed that high confidence labels during training cause the overfitting problem and DL selects disturb labels at random regardless of the confidence of labels. To solve this shortcoming of DL, we propose Directional DisturbLabel (DDL) a novel regularization technique that makes use of the class probabilities to infer the confident labels and using these labels to regularize the model. This active regularization makes use of the model behavior during training to regularize it in a more directed manner. To address regression problems, we also propose DisturbValue (DV), and DisturbError (DE). DE uses only predefined confident labels to disturb target values. DV injects noise into a portion of target values at random similar to DL. In this paper, 6 and 8 datasets are used to validate the robustness of our methods in classification and regression tasks respectively. Finally, we demonstrate that our methods are either comparable to or outperform DisturbLabel, L2 regularization, and Dropout. Also, we achieve the best performance in more than half the datasets by combining our methods with either L2 regularization or Dropout.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes