CLAug 30, 2022

Towards Boosting the Open-Domain Chatbot with Human Feedback

arXiv:2208.14165v1229 citationsh-index: 52
Originality Incremental advance
AI Analysis

This work addresses the misalignment with human preferences in chatbots for users seeking engaging conversations, though it is incremental as it builds on existing pre-trained models.

The paper tackles the problem of open-domain chatbots generating unengaging responses by proposing Diamante, an approach that collects human feedback through explicit demonstrations and implicit preferences to create a Chinese chit-chat dataset and uses joint training, resulting in significant performance boosts for Chinese pre-trained dialogue models.

Many open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses when interacting with real users. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient approach Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of Chinese pre-trained dialogue models.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes