SDMar 22

Emotion-Aware Quantization for Discrete Speech Representations: An Analysis of Emotion Preservation

Haoguang Zhou, Siyi Wang, Jingyao Wu, James Bailey, Ting Dang

arXiv:2603.2122434.21 citationsh-index: 5

AI Analysis

This addresses the problem of preserving emotional information in speech processing for applications like affective computing, though it is incremental as it builds on existing quantization techniques.

The paper tackles the problem of emotional information degradation in discrete speech representations during aggressive compression, showing that residual vector quantization disproportionately degrades emotion with uneven loss across classes. The result is the introduction of emotion-aware quantization methods, including Emo-Q, which improves emotion recognition performance at lower bitrates.

Modern speech systems increasingly use discretized self-supervised speech representations for compression and integration with token-based models, yet their impact on emotional information remains unclear. We study how residual vector quantization (RVQ) reshapes emotional information in discrete speech representations from both representation- and task-level perspectives. Our analysis shows that aggressive compression disproportionately degrades emotion, with uneven loss across emotion classes and model architectures. To address this, we introduce emotion-aware quantization using emotion-specific and emotion-biased codebooks, improving the preservation of both hard and soft emotion perception. We further propose Emo-Q, a lightweight routed quantization method that selects emotion-specialized codebooks, improving emotion recognition performance at lower bitrates. These results highlight the importance of emotion-aware discretization for robust affective speech processing.

View on arXiv PDF

Similar