CVLGMLMar 13, 2018

A Multi-Modal Approach to Infer Image Affect

arXiv:1803.05070v1
Originality Incremental advance
AI Analysis

This work addresses emotion recognition in images for applications like social media analysis, but it is incremental as it builds on existing multi-modal approaches.

The paper tackled the problem of inferring group affect or emotion in images by combining three additional modalities—human pose, text-based tagging, and CNN-extracted features—using deep neural networks for the first time, and evaluated performance against baselines.

The group affect or emotion in an image of people can be inferred by extracting features about both the people in the picture and the overall makeup of the scene. The state-of-the-art on this problem investigates a combination of facial features, scene extraction and even audio tonality. This paper combines three additional modalities, namely, human pose, text-based tagging and CNN extracted features / predictions. To the best of our knowledge, this is the first time all of the modalities were extracted using deep neural networks. We evaluate the performance of our approach against baselines and identify insights throughout this paper.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes