Knowing What You Know: Calibrating Dialogue Belief State Distributions via Ensembles
This work addresses the issue of unreliable confidence estimates in dialogue systems, which is crucial for robust conversational AI, though it is incremental as it builds on existing ensemble methods.
The paper tackled the problem of poorly calibrated belief distributions in multi-domain dialogue belief trackers, achieving state-of-the-art calibration performance and improved accuracy using a calibrated ensemble of models.
The ability to accurately track what happens during a conversation is essential for the performance of a dialogue system. Current state-of-the-art multi-domain dialogue state trackers achieve just over 55% accuracy on the current go-to benchmark, which means that in almost every second dialogue turn they place full confidence in an incorrect dialogue state. Belief trackers, on the other hand, maintain a distribution over possible dialogue states. However, they lack in performance compared to dialogue state trackers, and do not produce well calibrated distributions. In this work we present state-of-the-art performance in calibration for multi-domain dialogue belief trackers using a calibrated ensemble of models. Our resulting dialogue belief tracker also outperforms previous dialogue belief tracking models in terms of accuracy.