CLJul 26, 2023

Leveraging Implicit Feedback from Deployment Data in Dialogue

Richard Yuanzhe Pang, Stephen Roller, Kyunghyun Cho, He He, Jason Weston

arXiv:2307.14117v220.3113 citationsh-index: 107

Originality Incremental advance

AI Analysis

This work addresses the problem of enhancing dialogue quality for users of conversational AI, but it is incremental as it builds on existing deployment data and methods.

The paper tackled improving social conversational agents by learning from implicit feedback signals like user response length and sentiment in deployment data, finding that optimizing for conversation length increased controversial generations while optimizing for positive sentiment reduced such behaviors.

We study improving social conversational agents by learning from natural dialogue between users and a deployed model, without extra annotations. To implicitly measure the quality of a machine-generated utterance, we leverage signals like user response length, sentiment and reaction of the future human utterances in the collected dialogue episodes. Our experiments use the publicly released deployment data from BlenderBot (Xu et al., 2023). Human evaluation indicates improvements in our new models over baseline responses; however, we find that some proxy signals can lead to more generations with undesirable properties as well. For example, optimizing for conversation length can lead to more controversial or unfriendly generations compared to the baseline, whereas optimizing for positive sentiment or reaction can decrease these behaviors.

View on arXiv PDF

Similar