HCMay 17, 2018

Affective computing using speech and eye gaze: a review and bimodal system proposal for continuous affect prediction

arXiv:1805.06652v13 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This work addresses emotion assessment in audio-video communication for applications like teleconferencing, healthcare, and education, but it is incremental as it builds on existing multi-modal approaches.

The paper tackles continuous affect prediction by proposing a bimodal system combining speech and eye gaze, showing that adding eye gaze to speech improves prediction by 6.13% for valence and 1.62% for arousal.

Speech has been a widely used modality in the field of affective computing. Recently however, there has been a growing interest in the use of multi-modal affective computing systems. These multi-modal systems incorporate both verbal and non-verbal features for affective computing tasks. Such multi-modal affective computing systems are advantageous for emotion assessment of individuals in audio-video communication environments such as teleconferencing, healthcare, and education. From a review of the literature, the use of eye gaze features extracted from video is a modality that has remained largely unexploited for continuous affect prediction. This work presents a review of the literature within the emotion classification and continuous affect prediction sub-fields of affective computing for both speech and eye gaze modalities. Additionally, continuous affect prediction experiments using speech and eye gaze modalities are presented. A baseline system is proposed using open source software, the performance of which is assessed on a publicly available audio-visual corpus. Further system performance is assessed in a cross-corpus and cross-lingual experiment. The experimental results suggest that eye gaze is an effective supportive modality for speech when used in a bimodal continuous affect prediction system. The addition of eye gaze to speech in a simple feature fusion framework yields a prediction improvement of 6.13% for valence and 1.62% for arousal.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes