I'm Fine, But My Voice Isn't: Cross-Modal Affective Dissonance Detection for Reflective Journaling
This work addresses the authenticity gap in digital journaling for emotion regulation researchers and mental health applications, but the domain gap between synthetic and real speech remains a significant limitation.
The authors formalize Cross-Modal Affective Dissonance Detection (CADD) to detect mismatches between textual and acoustic emotions in digital journaling, achieving macro-F1 0.711 with a dual-encoder model using asymmetric cross-modal attention.
Digital journaling creates an authenticity gap: users consciously translate raw emotions into text, often sanitizing narratives even in private writing. We formalize this as Cross-Modal Affective Dissonance Detection (CADD), a directional three-way classification distinguishing Masking (positive text, negative acoustics), Coping (negative text, positive acoustics), and Congruent utterances, grounded in Gross's process model of emotion regulation. We present three further contributions: (i) CADD-Journal, a 1,800-sample TTS dataset with a shared-sentence-pool design that provably isolates acoustic signal from textual content; (ii) DACM, a dual-encoder model with asymmetric cross-modal attention that re-solves a gradient degeneracy in pooled fusion, achieving macro-F1 0.711 - with a four-step ablation demonstrating that asymmetric attention is the dominant driver (+ 0.242) while the DIM is effective only on cross-modal features (+0.033); and (iii) a domain gap quantification: zero-shot evaluation across three naturalistic corpora reveals a substantial gap between TTS-trained models and real speech, and we identify two concrete requirements for future in-the-wild corpus construction. ReflectJournal, a proof-of-concept iOS application, operationalizes the framework and provides a deployment platform for naturalistic data collection.