CLFeb 16, 2020

A Multimodal Dialogue System for Conversational Image Editing

arXiv:2002.06484v110 citations
AI Analysis

This work addresses the challenge of interactive image editing through dialogue, but it is incremental as it applies existing POMDP and DQN methods to a new multimodal task.

The paper tackled the problem of conversational image editing by developing a multimodal dialogue system, achieving a 90% success rate under high error rates with a DQN policy that outperformed a rule-based baseline.

In this paper, we present a multimodal dialogue system for Conversational Image Editing. We formulate our multimodal dialogue system as a Partially Observed Markov Decision Process (POMDP) and trained it with Deep Q-Network (DQN) and a user simulator. Our evaluation shows that the DQN policy outperforms a rule-based baseline policy, achieving 90\% success rate under high error rates. We also conducted a real user study and analyzed real user behavior.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes