CLAIMLJan 21, 2020

Improving Interaction Quality Estimation with BiLSTMs and the Impact on Dialogue Policy Learning

arXiv:2001.07615v1998 citations
AI Analysis

This work addresses the challenge of learning better dialogue behaviors in spoken dialogue systems by using user satisfaction estimation instead of task success, which could lead to more robust and user-friendly conversational agents.

The researchers tackled the problem of improving dialogue policy learning by developing a novel user satisfaction estimator based on BiLSTMs that outperforms previous estimators while learning temporal dependencies implicitly. They showed that applying this model in simulated experiments across multiple domains resulted in higher estimated satisfaction, similar task success rates, and greater robustness to noise.

Learning suitable and well-performing dialogue behaviour in statistical spoken dialogue systems has been in the focus of research for many years. While most work which is based on reinforcement learning employs an objective measure like task success for modelling the reward signal, we use a reward based on user satisfaction estimation. We propose a novel estimator and show that it outperforms all previous estimators while learning temporal dependencies implicitly. Furthermore, we apply this novel user satisfaction estimation model live in simulated experiments where the satisfaction estimation model is trained on one domain and applied in many other domains which cover a similar task. We show that applying this model results in higher estimated satisfaction, similar task success rates and a higher robustness to noise.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes