HCJan 18, 2017

Implicit Media Tagging and Affect Prediction from video of spontaneous facial expressions, recorded with depth camera

arXiv:1701.05248v14 citations

Originality Incremental advance

AI Analysis

This work addresses the challenge of affect prediction for applications in human-computer interaction and media tagging, but it is incremental as it builds on existing methods for facial analysis.

The paper tackled the problem of automatically evaluating emotional responses from spontaneous facial expressions using depth camera recordings, achieving a method that successfully predicted a four-dimensional representation of affect and identified periods of strongest emotional response with high agreement between independent viewers.

We present a method that automatically evaluates emotional response from spontaneous facial activity recorded by a depth camera. The automatic evaluation of emotional response, or affect, is a fascinating challenge with many applications, including human-computer interaction, media tagging and human affect prediction. Our approach in addressing this problem is based on the inferred activity of facial muscles over time, as captured by a depth camera recording an individual's facial activity. Our contribution is two-fold: First, we constructed a database of publicly available short video clips, which elicit a strong emotional response in a consistent manner across different individuals. Each video was tagged by its characteristic emotional response along 4 scales: \emph{Valence, Arousal, Likability} and \emph{Rewatch} (the desire to watch again). The second contribution is a two-step prediction method, based on learning, which was trained and tested using this database of tagged video clips. Our method was able to successfully predict the aforementioned 4 dimensional representation of affect, as well as to identify the period of strongest emotional response in the viewing recordings, in a method that is blind to the video clip being watch, revealing a significantly high agreement between the recordings of independent viewers.

View on arXiv PDF

Similar