SeLiNet: Sentiment enriched Lightweight Network for Emotion Recognition in Images
This work addresses efficient on-device emotion recognition for applications like mobile or embedded systems, but it is incremental as it builds on existing methods with minor performance trade-offs for size reduction.
The paper tackles emotion recognition in images by proposing SeLiNet, a sentiment-enriched lightweight network, achieving an Average Precision score of 27.17 on the EMOTIC dataset, close to the baseline of 27.38, while reducing model size by over 85%.
In this paper, we propose a sentiment-enriched lightweight network SeLiNet and an end-to-end on-device pipeline for contextual emotion recognition in images. SeLiNet model consists of body feature extractor, image aesthetics feature extractor, and learning-based fusion network which jointly estimates discrete emotion and human sentiments tasks. On the EMOTIC dataset, the proposed approach achieves an Average Precision (AP) score of 27.17 in comparison to the baseline AP score of 27.38 while reducing the model size by >85%. In addition, we report an on-device AP score of 26.42 with reduction in model size by >93% when compared to the baseline.