MM AIFeb 12, 2024

Synthesizing Sentiment-Controlled Feedback For Multimodal Text and Image Data

Puneet Kumar, Sarthak Malik, Balasubramanian Raman, Xiaobai Li

arXiv:2402.07640v42.31 citationsh-index: 11Has Code

Originality Incremental advance

AI Analysis

This addresses a gap in human-computer interaction for applications like education and healthcare, though it appears incremental as it builds on existing multimodal and controllable generation techniques.

The paper tackles the problem of generating sentiment-controlled feedback for multimodal text and image inputs, achieving a sentiment classification accuracy of 77.23%, which is 18.82% higher than without controllability.

The ability to generate sentiment-controlled feedback in response to multimodal inputs comprising text and images addresses a critical gap in human-computer interaction. This capability allows systems to provide empathetic, accurate, and engaging responses, with useful applications in education, healthcare, marketing, and customer service. To this end, we have constructed a large-scale Controllable Multimodal Feedback Synthesis (CMFeed) dataset and proposed a controllable feedback synthesis system. The system features an encoder, decoder, and controllability block for textual and visual inputs. It extracts features using a transformer and a Faster R-CNN network, combining them to generate feedback. The CMFeed dataset includes images, texts, reactions to the posts, human comments with relevance scores, and reactions to these comments. These reactions train the model to produce feedback with specified sentiments, achieving a sentiment classification accuracy of 77.23%, which is 18.82% higher than the accuracy without controllability. Access to the CMFeed dataset and the system's code is available at https://github.com/MIntelligence-Group/CMFeed.

View on arXiv PDF Code

Similar