CVNov 29, 2017

Learning Spatio-temporal Features with Partial Expression Sequences for on-the-Fly Prediction

arXiv:1711.10914v12 citations
Originality Incremental advance
AI Analysis

This addresses the need for real-time prediction in interactive settings like affective agents, where delays are intolerable, representing a domain-specific incremental improvement.

The paper tackled the problem of delayed facial expression prediction in video sequences by proposing a spatio-temporal feature learning method that enables on-the-fly prediction with partial sequences, achieving higher recognition rates compared to state-of-the-art methods on two datasets.

Spatio-temporal feature encoding is essential for encoding facial expression dynamics in video sequences. At test time, most spatio-temporal encoding methods assume that a temporally segmented sequence is fed to a learned model, which could require the prediction to wait until the full sequence is available to an auxiliary task that performs the temporal segmentation. This causes a delay in predicting the expression. In an interactive setting, such as affective interactive agents, such delay in the prediction could not be tolerated. Therefore, training a model that can accurately predict the facial expression "on-the-fly" (as they are fed to the system) is essential. In this paper, we propose a new spatio-temporal feature learning method, which would allow prediction with partial sequences. As such, the prediction could be performed on-the-fly. The proposed method utilizes an estimated expression intensity to generate dense labels, which are used to regulate the prediction model training with a novel objective function. As results, the learned spatio-temporal features can robustly predict the expression with partial (incomplete) expression sequences, on-the-fly. Experimental results showed that the proposed method achieved higher recognition rates compared to the state-of-the-art methods on both datasets. More importantly, the results verified that the proposed method improved the prediction frames with partial expression sequence inputs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes