Duplex Conversation: Towards Human-like Interaction in Spoken Dialogue Systems
This work addresses the challenge of improving customer service efficiency and user experience in telephone-based agents, though it appears incremental by building on existing concepts like full-duplex communication.
The paper tackles the problem of creating human-like interactions in spoken dialogue systems by introducing Duplex Conversation, which uses full-duplex communication and three subtasks to enable smooth turn-taking. Experimental results show the system reduces response latency by 50% in online A/B tests.
In this paper, we present Duplex Conversation, a multi-turn, multimodal spoken dialogue system that enables telephone-based agents to interact with customers like a human. We use the concept of full-duplex in telecommunication to demonstrate what a human-like interactive experience should be and how to achieve smooth turn-taking through three subtasks: user state detection, backchannel selection, and barge-in detection. Besides, we propose semi-supervised learning with multimodal data augmentation to leverage unlabeled data to increase model generalization. Experimental results on three sub-tasks show that the proposed method achieves consistent improvements compared with baselines. We deploy the Duplex Conversation to Alibaba intelligent customer service and share lessons learned in production. Online A/B experiments show that the proposed system can significantly reduce response latency by 50%.