SD ASMay 2, 2018

OMG Emotion Challenge - ExCouple Team

arXiv:1805.01576v11.5

Originality Synthesis-oriented

AI Analysis

This is an incremental approach for emotion recognition researchers, focusing on audio-only processing.

The authors tackled emotion recognition from audio by using a GAN trained on IEMOCAP for unsupervised audio representation, applied to the OMG Emotion Dataset, achieving predictions for Arousal and Valence values.

The proposed model is only for the audio module. All videos in the OMG Emotion Dataset are converted to WAV files. The proposed model makes use of semi-supervised learning for the emotion recognition. A GAN is trained with unsupervised learning, with another database (IEMOCAP), and part of the GAN structure (part of the autoencoder) will be used for the audio representation. The audio spectrogram will be extracted in 1-second windows of 16khz frequency, and this will serve as input to the model of audio representation trained with another database in an unsupervised way. This audio representation will serve as input to a convolutional network and a Dense layer with 'tanh' activation that performs the prediction of Arousal and Valence values. For joining the 1-second pieces of audio, the median of the predicted values of a given utterance will be taken.

View on arXiv PDF

Similar