Turn-level Dialog Evaluation with Dialog-level Weak Signals for Bot-Human Hybrid Customer Service Systems
This addresses the need for scalable evaluation of bot-human hybrid customer service systems, though it appears incremental as it builds on existing neural network and reward-based approaches.
The paper tackled the problem of evaluating customer service interactions by developing a machine learning model, Value Profiler, that quantifies success at the turn-level using weak dialog-level signals, and showed improvements in Amazon customer service quality.
We developed a machine learning approach that quantifies multiple aspects of the success or values in Customer Service contacts, at anytime during the interaction. Specifically, the value/reward function regarding to the turn-level behaviors across human agents, chatbots and other hybrid dialog systems is characterized by the incremental information and confidence gain between sentences, based on the token-level predictions from a multi-task neural network trained with only weak signals in dialog-level attributes/states. The resulting model, named Value Profiler, serves as a goal-oriented dialog manager that enhances conversations by regulating automated decisions with its reward and state predictions. It supports both real-time monitoring and scalable offline customer experience evaluation, for both bot- and human-handled contacts. We show how it improves Amazon customer service quality in several applications.