CLNov 2, 2018

Image Chat: Engaging Grounded Conversations

arXiv:1811.00945v2127 citations
AI Analysis

This work addresses the challenge of making AI conversations more engaging for users by grounding them in images and emotional traits, though it is incremental as it builds on existing methods and datasets.

The paper tackled the problem of engaging humans in image-grounded conversations by developing models that incorporate emotional styles, achieving state-of-the-art performance on the IGC task and near-human performance on their Image-Chat test set with a 47.7% preference rate.

To achieve the long-term goal of machines being able to engage humans in conversation, our models should captivate the interest of their speaking partners. Communication grounded in images, whereby a dialogue is conducted based on a given photo, is a setup naturally appealing to humans (Hu et al., 2014). In this work we study large-scale architectures and datasets for this goal. We test a set of neural architectures using state-of-the-art image and text representations, considering various ways to fuse the components. To test such models, we collect a dataset of grounded human-human conversations, where speakers are asked to play roles given a provided emotional mood or style, as the use of such traits is also a key factor in engagingness (Guo et al., 2019). Our dataset, Image-Chat, consists of 202k dialogues over 202k images using 215 possible style traits. Automatic metrics and human evaluations of engagingness show the efficacy of our approach; in particular, we obtain state-of-the-art performance on the existing IGC task, and our best performing model is almost on par with humans on the Image-Chat test set (preferred 47.7% of the time).

Code Implementations3 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes