CVLGJul 27, 2021

A Tale Of Two Long Tails

arXiv:2107.13098v125 citations
Originality Incremental advance
AI Analysis

This addresses the need for better uncertainty communication in ML-assisted decision-making, but it is incremental as it builds on existing uncertainty methods.

The paper tackled the problem of distinguishing between different sources of model uncertainty, such as atypical vs. noisy examples, and found that targeted data augmentation during training effectively characterizes and differentiates these sources, with results showing differing learning rates.

As machine learning models are increasingly employed to assist human decision-makers, it becomes critical to communicate the uncertainty associated with these model predictions. However, the majority of work on uncertainty has focused on traditional probabilistic or ranking approaches - where the model assigns low probabilities or scores to uncertain examples. While this captures what examples are challenging for the model, it does not capture the underlying source of the uncertainty. In this work, we seek to identify examples the model is uncertain about and characterize the source of said uncertainty. We explore the benefits of designing a targeted intervention - targeted data augmentation of the examples where the model is uncertain over the course of training. We investigate whether the rate of learning in the presence of additional information differs between atypical and noisy examples? Our results show that this is indeed the case, suggesting that well-designed interventions over the course of training can be an effective way to characterize and distinguish between different sources of uncertainty.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes