CVAIJul 19, 2024

EmoCAM: Toward Understanding What Drives CNN-based Emotion Recognition

arXiv:2407.14314v11 citationsh-index: 4
Originality Synthesis-oriented
AI Analysis

This work addresses the explainability issue in emotion recognition models, which is important for researchers and practitioners in AI and psychology, but it is incremental as it builds on existing CAM and object detection methods.

The authors tackled the problem of understanding what drives CNN-based emotion recognition by proposing a framework that combines CAM-based techniques with object detection to analyze image cues used by the EmoNet model, finding that it primarily focuses on human characteristics and is affected by specific image modifications.

Convolutional Neural Networks are particularly suited for image analysis tasks, such as Image Classification, Object Recognition or Image Segmentation. Like all Artificial Neural Networks, however, they are "black box" models, and suffer from poor explainability. This work is concerned with the specific downstream task of Emotion Recognition from images, and proposes a framework that combines CAM-based techniques with Object Detection on a corpus level to better understand on which image cues a particular model, in our case EmoNet, relies to assign a specific emotion to an image. We demonstrate that the model mostly focuses on human characteristics, but also explore the pronounced effect of specific image modifications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes