Is the User Enjoying the Conversation? A Case Study on the Impact on the Reward Function
This work addresses user satisfaction modeling for dialogue systems, offering incremental improvements in reward function inference.
The authors tackled the problem of estimating user satisfaction in task-oriented dialogue systems by using deep neural networks with distributed semantic representations, showing that their hierarchical network outperforms state-of-the-art quality estimators and improves task success rates when applied to reward functions in POMDPs.
The impact of user satisfaction in policy learning task-oriented dialogue systems has long been a subject of research interest. Most current models for estimating the user satisfaction either (i) treat out-of-context short-texts, such as product reviews, or (ii) rely on turn features instead of on distributed semantic representations. In this work we adopt deep neural networks that use distributed semantic representation learning for estimating the user satisfaction in conversations. We evaluate the impact of modelling context length in these networks. Moreover, we show that the proposed hierarchical network outperforms state-of-the-art quality estimators. Furthermore, we show that applying these networks to infer the reward function in a Partial Observable Markov Decision Process (POMDP) yields to a great improvement in the task success rate.