Mars: Modeling Context & State Representations with Contrastive Learning for End-to-End Task-Oriented Dialog
This work addresses the challenge of generating high-quality system responses in task-oriented dialog systems, which is incremental as it builds on existing methods by refining context-state modeling.
The paper tackled the problem of improving end-to-end task-oriented dialog systems by exploring how dialog context representations affect belief and action state quality, proposing Mars with contrastive learning strategies. The result showed that more distinct context representations enhance multi-turn dialog performance, achieving state-of-the-art results on MultiWOZ 2.0, CamRest676, and CrossWOZ datasets.
Traditional end-to-end task-oriented dialog systems first convert dialog context into belief state and action state before generating the system response. The system response performance is significantly affected by the quality of the belief state and action state. We first explore what dialog context representation is beneficial to improving the quality of the belief state and action state, which further enhances the generated response quality. To tackle our exploration, we propose Mars, an end-to-end task-oriented dialog system with two contrastive learning strategies to model the relationship between dialog context and belief/action state representations. Empirical results show dialog context representations, which are more different from semantic state representations, are more conducive to multi-turn task-oriented dialog. Moreover, our proposed Mars achieves state-of-the-art performance on the MultiWOZ 2.0, CamRest676, and CrossWOZ.