CVHCOct 13, 2015

Variable-state Latent Conditional Random Fields for Facial Expression Recognition and Action Unit Detection

arXiv:1510.03909v174 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving facial expression recognition and action unit detection for applications in human-computer interaction and affective computing, representing an incremental advancement over existing L-CRF methods.

The paper tackled the challenge of modeling facial expression and action unit dynamics in videos by proposing a variable-state latent conditional random field (VSL-CRF) model that automatically selects optimal latent states, achieving better generalization performance on three public databases compared to traditional L-CRFs and other state-of-the-art models.

Automated recognition of facial expressions of emotions, and detection of facial action units (AUs), from videos depends critically on modeling of their dynamics. These dynamics are characterized by changes in temporal phases (onset-apex-offset) and intensity of emotion expressions and AUs, the appearance of which may vary considerably among target subjects, making the recognition/detection task very challenging. The state-of-the-art Latent Conditional Random Fields (L-CRF) framework allows one to efficiently encode these dynamics through the latent states accounting for the temporal consistency in emotion expression and ordinal relationships between its intensity levels, these latent states are typically assumed to be either unordered (nominal) or fully ordered (ordinal). Yet, such an approach is often too restrictive. For instance, in the case of AU detection, the goal is to discriminate between the segments of an image sequence in which this AU is active or inactive. While the sequence segments containing activation of the target AU may better be described using ordinal latent states, the inactive segments better be described using unordered (nominal) latent states, as no assumption can be made about their underlying structure (since they can contain either neutral faces or activations of non-target AUs). To address this, we propose the variable-state L-CRF (VSL-CRF) model that automatically selects the optimal latent states for the target image sequence. To reduce the model overfitting either the nominal or ordinal latent states, we propose a novel graph-Laplacian regularization of the latent states. Our experiments on three public expression databases show that the proposed model achieves better generalization performance compared to traditional L-CRFs and other related state-of-the-art models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes