CLFeb 16, 2020

A Multimodal Dialogue System for Conversational Image Editing

Tzu-Hsiang Lin, Trung Bui, Doo Soon Kim, Jean Oh

arXiv:2002.06484v11.910 citations

Originality Synthesis-oriented

AI Analysis

This work addresses the challenge of interactive image editing through dialogue, but it is incremental as it applies existing POMDP and DQN methods to a new multimodal task.

The paper tackled the problem of conversational image editing by developing a multimodal dialogue system, achieving a 90% success rate under high error rates with a DQN policy that outperformed a rule-based baseline.

In this paper, we present a multimodal dialogue system for Conversational Image Editing. We formulate our multimodal dialogue system as a Partially Observed Markov Decision Process (POMDP) and trained it with Deep Q-Network (DQN) and a user simulator. Our evaluation shows that the DQN policy outperforms a rule-based baseline policy, achieving 90\% success rate under high error rates. We also conducted a real user study and analyzed real user behavior.

View on arXiv PDF

Similar