Joint Turn and Dialogue level User Satisfaction Estimation on Multi-Domain Conversations
This work addresses the need for generalizable dialogue quality estimation for data-driven dialogue management, though it is incremental in improving existing methods.
The paper tackled the problem of estimating user satisfaction in multi-domain conversations by proposing a joint turn and dialogue-level model that eliminates hand-crafted features, achieving up to a 27% absolute improvement in linear correlation performance over baselines.
Dialogue level quality estimation is vital for optimizing data driven dialogue management. Current automated methods to estimate turn and dialogue level user satisfaction employ hand-crafted features and rely on complex annotation schemes, which reduce the generalizability of the trained models. We propose a novel user satisfaction estimation approach which minimizes an adaptive multi-task loss function in order to jointly predict turn-level Response Quality labels provided by experts and explicit dialogue-level ratings provided by end users. The proposed BiLSTM based deep neural net model automatically weighs each turn's contribution towards the estimated dialogue-level rating, implicitly encodes temporal dependencies, and removes the need to hand-craft features. On dialogues sampled from 28 Alexa domains, two dialogue systems and three user groups, the joint dialogue-level satisfaction estimation model achieved up to an absolute 27% (0.43->0.70) and 7% (0.63->0.70) improvement in linear correlation performance over baseline deep neural net and benchmark Gradient boosting regression models, respectively.