HC AI LG MLMay 29, 2020

Large-scale Hybrid Approach for Predicting User Satisfaction with Conversational Agents

Dookun Park, Hao Yuan, Dongmin Kim, Yinglei Zhang, Matsoukas Spyros, Young-Bum Kim, Ruhi Sarikaya, Edward Guo, Yuan Ling, Kevin Quinn, Pham Hung, Benjamin Yao

arXiv:2006.07113v114.718 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of scalable user satisfaction measurement for large-scale conversational agent systems, presenting an incremental improvement by combining complementary data sources.

The paper tackles the problem of assessing user satisfaction in conversational agents by proposing a hybrid approach that fuses explicit user feedback with predictions from two machine-learned models trained on feedback and human annotation data, showing significant improvements in inference accuracy on Amazon Alexa datasets.

Measuring user satisfaction level is a challenging task, and a critical component in developing large-scale conversational agent systems serving the needs of real users. An widely used approach to tackle this is to collect human annotation data and use them for evaluation or modeling. Human annotation based approaches are easier to control, but hard to scale. A novel alternative approach is to collect user's direct feedback via a feedback elicitation system embedded to the conversational agent system, and use the collected user feedback to train a machine-learned model for generalization. User feedback is the best proxy for user satisfaction, but is not available for some ineligible intents and certain situations. Thus, these two types of approaches are complementary to each other. In this work, we tackle the user satisfaction assessment problem with a hybrid approach that fuses explicit user feedback, user satisfaction predictions inferred by two machine-learned models, one trained on user feedback data and the other human annotation data. The hybrid approach is based on a waterfall policy, and the experimental results with Amazon Alexa's large-scale datasets show significant improvements in inferring user satisfaction. A detailed hybrid architecture, an in-depth analysis on user feedback data, and an algorithm that generates data sets to properly simulate the live traffic are presented in this paper.

View on arXiv PDF

Similar