CLAIJan 12

A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models

arXiv:2601.07565v11 citationsh-index: 1
Originality Highly original
AI Analysis

This addresses the problem of effective multimodal emotion recognition and sentiment analysis for applications in human-computer interaction, with incremental improvements through novel fusion techniques.

The paper tackles multimodal emotion understanding by proposing EGMF, a unified framework that combines expert-guided multimodal fusion with large language models, achieving consistent improvements over state-of-the-art methods on bilingual benchmarks like MELD, CHERMA, MOSEI, and SIMS-V2 with superior cross-lingual robustness.

Multimodal emotion understanding requires effective integration of text, audio, and visual modalities for both discrete emotion recognition and continuous sentiment analysis. We present EGMF, a unified framework combining expert-guided multimodal fusion with large language models. Our approach features three specialized expert networks--a fine-grained local expert for subtle emotional nuances, a semantic correlation expert for cross-modal relationships, and a global context expert for long-range dependencies--adaptively integrated through hierarchical dynamic gating for context-aware feature selection. Enhanced multimodal representations are integrated with LLMs via pseudo token injection and prompt-based conditioning, enabling a single generative framework to handle both classification and regression through natural language generation. We employ LoRA fine-tuning for computational efficiency. Experiments on bilingual benchmarks (MELD, CHERMA, MOSEI, SIMS-V2) demonstrate consistent improvements over state-of-the-art methods, with superior cross-lingual robustness revealing universal patterns in multimodal emotional expressions across English and Chinese. We will release the source code publicly.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes