CVMay 7

Multimodal Emotion Recognition via Causal-Diffusion Bridge (Affect-Diff)

arXiv:2605.082527.0

Predicted impact top 97% in CV · last 90 daysOriginality Highly original

AI Analysis

For researchers in affective computing, this addresses the critical problem of minority emotion recognition under severe class imbalance, offering a principled causal-diffusion approach.

Affect-Diff tackles extreme class imbalance in multimodal emotion recognition on CMU-MOSEI, where minority emotions are ignored by standard models. It achieves 18% relative improvement in balanced accuracy (0.384 vs 0.324) and detects all six emotion classes, unlike baselines that yield zero F1 on three minority classes.

Multimodal emotion recognition on CMU-MOSEI faces an extreme imbalance as Happy accounts for 65.9% of samples while three Ekman categories collectively represent under 7%, causing standard fusion models to maximize accuracy by ignoring minority emotions entirely. We present Affect-Diff, a Causal-Diffusion Bridge that addresses this through three jointly trained mechanisms: a NOTEARS-learned causal graph that re-weights modality contributions before fusion, a beta-VAE bottleneck for regularized latent compression, and a stop-gradiented 1D DDPM prior that structures the latent space against majority-class collapse. On 3,292 aligned CMU-MOSEI samples, Affect-Diff achieves validation balanced accuracy 0.384, an 18% relative improvement over the strongest baseline (TETFN: 0.324), while all evaluated baselines produce zero F1 on Fear, Disgust, and Surprise. Ablation studies confirm independent, non-redundant contributions from the diffusion prior (-24% without it) and causal graph (-13%). Notably, only the deterministic-encoder variant detects all six emotion classes, revealing KL regularization strength as a direct lever for minority-class sensitivity.

View on arXiv PDF

Similar