SDASApr 18, 2018

Shaking Acoustic Spectral Sub-bands Can Better Regularize Learning in Affective Computing

arXiv:1804.06779v1
Originality Synthesis-oriented
AI Analysis

This is an incremental improvement for affective computing, specifically in speech emotion recognition.

The paper tackled speech emotion recognition by applying Shake-Shake regularization to acoustic sub-bands, showing that shaking sub-bands independently yields better models than shaking entire feature maps and improves performance with early stopping.

In this work, we investigate a recently proposed regularization technique based on multi-branch architectures, called Shake-Shake regularization, for the task of speech emotion recognition. In addition, we also propose variants to incorporate domain knowledge into model configurations. The experimental results demonstrate: $1)$ independently shaking sub-bands delivers favorable models compared to shaking the entire spectral-temporal feature maps. $2)$ with proper patience in early stopping, the proposed models can simultaneously outperform the baseline and maintain a smaller performance gap between training and validation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes