Emotion Dependent Facial Animation from Affective Speech
This work addresses the need for more naturalistic conversational agents in human-computer interaction by improving facial animation from affective speech, though it is incremental as it builds on existing emotion classification and synthesis techniques.
The paper tackles the problem of generating facial animations synchronized with affective speech by proposing a two-stage deep learning approach that first classifies speech into emotion categories and then uses separate estimators for each category to synthesize facial shapes. The method achieves lower Mean Squared Error and better landmark animations compared to a universal model, as evaluated on the SAVEE dataset.
In human-to-computer interaction, facial animation in synchrony with affective speech can deliver more naturalistic conversational agents. In this paper, we present a two-stage deep learning approach for affective speech driven facial shape animation. In the first stage, we classify affective speech into seven emotion categories. In the second stage, we train separate deep estimators within each emotion category to synthesize facial shape from the affective speech. Objective and subjective evaluations are performed over the SAVEE dataset. The proposed emotion dependent facial shape model performs better in terms of the Mean Squared Error (MSE) loss and in generating the landmark animations, as compared to training a universal model regardless of the emotion.