HC CVJul 9, 2016

Multimodal Affect Recognition using Kinect

arXiv:1607.02652v115.223 citations

Originality Incremental advance

AI Analysis

This work addresses emotion-aware systems for applications like interactive robots and tutors, but it is incremental, building on existing multimodal approaches.

The paper tackled multimodal emotion recognition by combining facial, body, and speech data from a Kinect, using temporal features and fusion methods, resulting in improved recognition rates over supervised learning alone and position-based features.

Affect (emotion) recognition has gained significant attention from researchers in the past decade. Emotion-aware computer systems and devices have many applications ranging from interactive robots, intelligent online tutor to emotion based navigation assistant. In this research data from multiple modalities such as face, head, hand, body and speech was utilized for affect recognition. The research used color and depth sensing device such as Kinect for facial feature extraction and tracking human body joints. Temporal features across multiple frames were used for affect recognition. Event driven decision level fusion was used to combine the results from each individual modality using majority voting to recognize the emotions. The study also implemented affect recognition by matching the features to the rule based emotion templates per modality. Experiments showed that multimodal affect recognition rates using combination of emotion templates and supervised learning were better compared to recognition rates based on supervised learning alone. Recognition rates obtained using temporal feature were higher compared to recognition rates obtained using position based features only.

View on arXiv PDF

Similar