Eye-based Continuous Affect Prediction
This work addresses the underutilization of eye-based cues in affective computing, offering incremental improvements for emotion prediction tasks in domains like human-computer interaction.
The paper tackled the problem of predicting continuous affect (arousal and valence) by proposing and refining a set of eye-based features estimated from video, which were shown to provide benefits for affect prediction when combined with speech features, achieving performance compared to human-level on the RECOLA test set.
Eye-based information channels include the pupils, gaze, saccades, fixational movements, and numerous forms of eye opening and closure. Pupil size variation indicates cognitive load and emotion, while a person's gaze direction is said to be congruent with the motivation to approach or avoid stimuli. The eyelids are involved in facial expressions that can encode basic emotions. Additionally, eye-based cues can have implications for human annotators of emotions or feelings. Despite these facts, the use of eye-based cues in affective computing is in its infancy, however, and this work is intended to start to address this. Eye-based feature sets, incorporating data from all of the aforementioned information channels, that can be estimated from video are proposed. Feature set refinement is provided by way of continuous arousal and valence learning and prediction experiments on the RECOLA validation set. The eye-based features are then combined with a speech feature set to provide confirmation of their usefulness and assess affect prediction performance compared with group-of-humans-level performance on the RECOLA test set. The core contribution of this paper, a refined eye-based feature set, is shown to provide benefits for affect prediction. It is hoped that this work stimulates further research into eye-based affective computing.