A Multimodal Dialogue System for Conversational Image Editing
This work addresses the challenge of interactive image editing through dialogue, but it is incremental as it applies existing POMDP and DQN methods to a new multimodal task.
The paper tackled the problem of conversational image editing by developing a multimodal dialogue system, achieving a 90% success rate under high error rates with a DQN policy that outperformed a rule-based baseline.
In this paper, we present a multimodal dialogue system for Conversational Image Editing. We formulate our multimodal dialogue system as a Partially Observed Markov Decision Process (POMDP) and trained it with Deep Q-Network (DQN) and a user simulator. Our evaluation shows that the DQN policy outperforms a rule-based baseline policy, achieving 90\% success rate under high error rates. We also conducted a real user study and analyzed real user behavior.