IRAICLJul 6, 2021

PhotoChat: A Human-Human Dialogue Dataset with Photo Sharing Behavior for Joint Image-Text Modeling

arXiv:2108.01453v1712 citations
Originality Synthesis-oriented
AI Analysis

This dataset addresses the problem of joint image-text modeling for researchers in AI and natural language processing, but it is incremental as it builds on existing multimodal datasets.

The authors introduced PhotoChat, a dataset of 12k human-human dialogues with shared photos, to study photo-sharing behavior in online messaging, and proposed two tasks: photo-sharing intent prediction and photo retrieval, with baseline models achieving 58.1% F1 score and 10.4% recall@1, respectively.

We present a new human-human dialogue dataset - PhotoChat, the first dataset that casts light on the photo sharing behavior in onlin emessaging. PhotoChat contains 12k dialogues, each of which is paired with a user photo that is shared during the conversation. Based on this dataset, we propose two tasks to facilitate research on image-text modeling: a photo-sharing intent prediction task that predicts whether one intends to share a photo in the next conversation turn, and a photo retrieval task that retrieves the most relevant photo according to the dialogue context. In addition, for both tasks, we provide baseline models using the state-of-the-art models and report their benchmark performances. The best image retrieval model achieves 10.4% recall@1 (out of 1000 candidates) and the best photo intent prediction model achieves 58.1% F1 score, indicating that the dataset presents interesting yet challenging real-world problems. We are releasing PhotoChat to facilitate future research work among the community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes