AIMMJun 17, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning

arXiv:2406.11161v2209 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of capturing complex real-world emotional expressions for applications in human-computer interaction, education, and counseling, though it appears incremental as it builds on existing MLLM frameworks.

The paper tackles the problem of multimodal emotion recognition by introducing Emotion-LLaMA, a model that integrates audio, visual, and textual inputs with emotion-specific encoders and instruction tuning, achieving top scores on benchmarks like EMER (Clue Overlap 7.83, Label Overlap 6.25) and DFEW (UAR 45.59, WAR 59.37).

Accurate emotion perception is crucial for various applications, including human-computer interaction, education, and counseling. However, traditional single-modality approaches often fail to capture the complexity of real-world emotional expressions, which are inherently multimodal. Moreover, existing Multimodal Large Language Models (MLLMs) face challenges in integrating audio and recognizing subtle facial micro-expressions. To address this, we introduce the MERR dataset, containing 28,618 coarse-grained and 4,487 fine-grained annotated samples across diverse emotional categories. This dataset enables models to learn from varied scenarios and generalize to real-world applications. Furthermore, we propose Emotion-LLaMA, a model that seamlessly integrates audio, visual, and textual inputs through emotion-specific encoders. By aligning features into a shared space and employing a modified LLaMA model with instruction tuning, Emotion-LLaMA significantly enhances both emotional recognition and reasoning capabilities. Extensive evaluations show Emotion-LLaMA outperforms other MLLMs, achieving top scores in Clue Overlap (7.83) and Label Overlap (6.25) on EMER, an F1 score of 0.9036 on MER2023-SEMI challenge, and the highest UAR (45.59) and WAR (59.37) in zero-shot evaluations on DFEW dataset.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes