CVMar 16, 2023
Facial Affect Recognition based on Transformer Encoder and Audiovisual Fusion for the ABAW5 ChallengeZiyang Zhang, Liuwei An, Zishun Cui et al.
In this paper, we present our solutions for the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW), which includes four sub-challenges of Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection and Emotional Reaction Intensity (ERI) Estimation. The 5th ABAW competition focuses on facial affect recognition utilizing different modalities and datasets. In our work, we extract powerful audio and visual features using a large number of sota models. These features are fused by Transformer Encoder and TEMMA. Besides, to avoid the possible impact of large dimensional differences between various features, we design an Affine Module to align different features to the same dimension. Extensive experiments demonstrate that the superiority of the proposed method. For the VA Estimation sub-challenge, our method obtains the mean Concordance Correlation Coefficient (CCC) of 0.6066. For the Expression Classification sub-challenge, the average F1 Score is 0.4055. For the AU Detection sub-challenge, the average F1 Score is 0.5296. For the Emotional Reaction Intensity Estimation sub-challenge, the average pearson's correlations coefficient on the validation set is 0.3968. All of the results of four sub-challenges outperform the baseline with a large margin.
CLJul 8, 2024
PsycoLLM: Enhancing LLM for Psychological Understanding and EvaluationJinpeng Hu, Tengteng Dong, Luo Gang et al.
Mental health has attracted substantial attention in recent years and LLM can be an effective technology for alleviating this problem owing to its capability in text understanding and dialogue. However, existing research in this domain often suffers from limitations, such as training on datasets lacking crucial prior knowledge and evidence, and the absence of comprehensive evaluation methods. In this paper, we propose a specialized psychological large language model (LLM), named PsycoLLM, trained on a proposed high-quality psychological dataset, including single-turn QA, multi-turn dialogues and knowledge-based QA. Specifically, we construct multi-turn dialogues through a three-step pipeline comprising multi-turn QA generation, evidence judgment, and dialogue refinement. We augment this process with real-world psychological case backgrounds extracted from online platforms, enhancing the relevance and applicability of the generated data. Additionally, to compare the performance of PsycoLLM with other LLMs, we develop a comprehensive psychological benchmark based on authoritative psychological counseling examinations in China, which includes assessments of professional ethics, theoretical proficiency, and case analysis. The experimental results on the benchmark illustrate the effectiveness of PsycoLLM, which demonstrates superior performance compared to other LLMs.
CVMar 31, 2025
AMMSM: Adaptive Motion Magnification and Sparse Mamba for Micro-Expression RecognitionXuxiong Liu, Tengteng Dong, Fei Wang et al.
Micro-expressions are typically regarded as unconscious manifestations of a person's genuine emotions. However, their short duration and subtle signals pose significant challenges for downstream recognition. We propose a multi-task learning framework named the Adaptive Motion Magnification and Sparse Mamba (AMMSM) to address this. This framework aims to enhance the accurate capture of micro-expressions through self-supervised subtle motion magnification, while the sparse spatial selection Mamba architecture combines sparse activation with the advanced Visual Mamba model to model key motion regions and their valuable representations more effectively. Additionally, we employ evolutionary search to optimize the magnification factor and the sparsity ratios of spatial selection, followed by fine-tuning to improve performance further. Extensive experiments on two standard datasets demonstrate that the proposed AMMSM achieves state-of-the-art (SOTA) accuracy and robustness.