Going Beyond One-Hot Encoding in Classification: Can Human Uncertainty Improve Model Performance?
This work addresses the issue of unreliable labels in machine learning for domain experts, though it is incremental as it builds on existing calibration methods.
The paper tackles the problem of neglecting human uncertainty in labeling processes for deep learning by embedding label uncertainty into training via distributional labels. The result is improved generalization and model performance on a remote sensing image classification dataset, with better-calibrated probabilities leading to more certain predictions.
Technological and computational advances continuously drive forward the broad field of deep learning. In recent years, the derivation of quantities describing theuncertainty in the prediction - which naturally accompanies the modeling process - has sparked general interest in the deep learning community. Often neglected in the machine learning setting is the human uncertainty that influences numerous labeling processes. As the core of this work, label uncertainty is explicitly embedded into the training process via distributional labels. We demonstrate the effectiveness of our approach on image classification with a remote sensing data set that contains multiple label votes by domain experts for each image: The incorporation of label uncertainty helps the model to generalize better to unseen data and increases model performance. Similar to existing calibration methods, the distributional labels lead to better-calibrated probabilities, which in turn yield more certain and trustworthy predictions.