CLSep 1, 2021

Evaluating Predictive Uncertainty under Distributional Shift on Dialogue Dataset

arXiv:2109.00186v1
Originality Synthesis-oriented
AI Analysis

This work addresses the challenge of robust uncertainty estimation for dialogue systems in real-world scenarios with distributional shifts, though it is incremental as it focuses on evaluation methods rather than new models.

The paper tackled the problem of evaluating predictive uncertainty in open-domain dialogues under distributional shift, proposing two corruption methods (Unknown Word and Insufficient Context) to simulate gradual shifts, and found that existing uncertainty estimation methods consistently degrade in accuracy and calibration as the shift intensifies.

In open-domain dialogues, predictive uncertainties are mainly evaluated in a domain shift setting to cope with out-of-distribution inputs. However, in real-world conversations, there could be more extensive distributional shifted inputs than the out-of-distribution. To evaluate this, we first propose two methods, Unknown Word (UW) and Insufficient Context (IC), enabling gradual distributional shifts by corruption on the dialogue dataset. We then investigate the effect of distributional shifts on accuracy and calibration. Our experiments show that the performance of existing uncertainty estimation methods consistently degrades with intensifying the shift. The results suggest that the proposed methods could be useful for evaluating the calibration of dialogue systems under distributional shifts.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes