AI CLJun 4

ProSarc: Prosody-Aware Sarcasm Recognition Framework via Temporal Prosodic Incongruity

Prathamjyot Singh, Ashima Sood, Sahil Sharma, Jasmeet Singh

arXiv:2606.0616838.7

Predicted impact top 64% in AI · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in sarcasm detection and affective computing, ProSarc provides a novel audio-only method that leverages prosodic incongruity, outperforming prior approaches and generalizing to spontaneous and cross-lingual speech.

ProSarc detects sarcasm in speech by modeling temporal prosodic incongruity, achieving state-of-the-art F1 scores of 75.3 on MUStARD++, 62.9 on PodSarc, and 65.6 on MuSaG, with uncertainty estimates and sarcasm onset localization.

We present ProSarc, an audio-only framework that detects sarcasm by modelling temporal prosodic incongruity, that is, the mismatch between local prosodic dynamics and the utterance-level emotional baseline. Dual encoding paths, a Global Emotion Encoder and a Temporal Prosody Encoder (BiLSTM + multi-head attention), feed a Prosodic Incongruity Analyzer that produces a scalar incongruity score for classification. Monte Carlo dropout provides uncertainty estimates, and an attention-based mechanism localises sarcastic onset without frame-level labels. ProSarc outperforms prior audio-only methods on MUStARD++ (F1=75.3) and generalises to spontaneous (PodSarc, F1=62.9) and cross-lingual speech (MuSaG, F1=65.6). Ten-run validation confirms the contribution of incongruity modelling (Wilcoxon p=0.002, Cohen's d=1.51). Human evaluation shows that model uncertainty tracks perceptual ambiguity and predicted onsets align with human-annotated temporal windows.

View on arXiv PDF

Similar