CVAIMar 14, 2025

Compound Expression Recognition via Large Vision-Language Models

arXiv:2503.11241v15 citationsh-index: 2
Originality Incremental advance
AI Analysis

This work addresses the problem of understanding complex human emotions for applications in emotion analysis and human-computer interaction, representing an incremental improvement.

The paper tackled Compound Expression Recognition (CER) by proposing a novel approach using Large Vision-Language Models (LVLMs) with a two-stage fine-tuning process, achieving advanced accuracy on the RAF-DB dataset and strong zero-shot generalization on the C-EXPR-DB dataset.

Compound Expression Recognition (CER) is crucial for understanding human emotions and improving human-computer interaction. However, CER faces challenges due to the complexity of facial expressions and the difficulty of capturing subtle emotional cues. To address these issues, we propose a novel approach leveraging Large Vision-Language Models (LVLMs). Our method employs a two-stage fine-tuning process: first, pre-trained LVLMs are fine-tuned on basic facial expressions to establish foundational patterns; second, the model is further optimized on a compound-expression dataset to refine visual-language feature interactions. Our approach achieves advanced accuracy on the RAF-DB dataset and demonstrates strong zero-shot generalization on the C-EXPR-DB dataset, showcasing its potential for real-world applications in emotion analysis and human-computer interaction.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes