FIRMED: A Peak-Centered Multimodal Dataset with Fine-Grained Annotation for Emotion Recognition
This addresses the need for temporally localized supervision in multimodal affective computing, offering a practical benchmark for researchers in emotion recognition, though it is incremental as it improves upon existing annotation methods.
The researchers tackled the problem of temporal label noise in video-induced physiological datasets for emotion recognition by introducing FIRMED, a peak-centered multimodal dataset with fine-grained annotations, which consistently outperformed whole-trial labeling with an average gain of 3.8 percentage points across eight EEG-based classifiers.
Traditional video-induced physiological datasets usually rely on whole-trial labels, which introduce temporal label noise in dynamic emotion recognition. We present FIRMED, a peak-centered multimodal dataset based on an immediate-recall annotation paradigm, with synchronized EEG, ECG, GSR, PPG, and facial recordings from 35 participants. FIRMED provides event-centered timestamps, emotion labels, and intensity annotations, and its annotation quality is supported by subjective and physiological validation. Benchmark experiments show that FIRMED consistently outperforms whole-trial labeling, yielding an average gain of 3.8 percentage points across eight EEG-based classifiers, with further improvements under multimodal fusion. FIRMED provides a practical benchmark for temporally localized supervision in multimodal affective computing.