CVOct 30, 2020

Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss

Panagiotis Paraskevas Filntisis, Niki Efthymiou, Gerasimos Potamianos, Petros Maragos

arXiv:2010.16396v19.623 citationsHas Code

Originality Incremental advance

AI Analysis

This work addresses emotion understanding in videos for applications like human-computer interaction, but it is incremental as it builds on existing frameworks.

The paper tackled emotion recognition in videos by extending the Temporal Segment Network to incorporate body, context, and visual-semantic embeddings, achieving an Emotion Recognition Score of 0.26235 on the BoLD test set, surpassing the previous best of 0.2530.

We present our winning submission to the First International Workshop on Bodily Expressed Emotion Understanding (BEEU) challenge. Based on recent literature on the effect of context/environment on emotion, as well as visual representations with semantic meaning using word embeddings, we extend the framework of Temporal Segment Network to accommodate these. Our method is verified on the validation set of the Body Language Dataset (BoLD) and achieves 0.26235 Emotion Recognition Score on the test set, surpassing the previous best result of 0.2530.

View on arXiv PDF Code

Similar