Cong Cai

CV
3papers
35citations
Novelty32%
AI Score22

3 Papers

ASMar 25, 2022
EmotionNAS: Two-stream Neural Architecture Search for Speech Emotion Recognition

Haiyang Sun, Zheng Lian, Bin Liu et al.

Speech emotion recognition (SER) is an important research topic in human-computer interaction. Existing works mainly rely on human expertise to design models. Despite their success, different datasets often require distinct structures and hyperparameters. Searching for an optimal model for each dataset is time-consuming and labor-intensive. To address this problem, we propose a two-stream neural architecture search (NAS) based framework, called \enquote{EmotionNAS}. Specifically, we take two-stream features (i.e., handcrafted and deep features) as the inputs, followed by NAS to search for the optimal structure for each stream. Furthermore, we incorporate complementary information in different streams through an efficient information supplement module. Experimental results demonstrate that our method outperforms existing manually-designed and NAS-based models, setting the new state-of-the-art record.

CVJul 23, 2022
Two-Aspect Information Fusion Model For ABAW4 Multi-task Challenge

Haiyang Sun, Zheng Lian, Bin Liu et al.

In this paper, we propose the solution to the Multi-Task Learning (MTL) Challenge of the 4th Affective Behavior Analysis in-the-wild (ABAW) competition. The task of ABAW is to predict frame-level emotion descriptors from videos: discrete emotional state; valence and arousal; and action units. Although researchers have proposed several approaches and achieved promising results in ABAW, current works in this task rarely consider interactions between different emotion descriptors. To this end, we propose a novel end to end architecture to achieve full integration of different types of information. Experimental results demonstrate the effectiveness of our proposed solution.

CVJul 17, 2024
MDPE: A Multimodal Deception Dataset with Personality and Emotional Characteristics

Cong Cai, Shan Liang, Xuefei Liu et al.

Deception detection has garnered increasing attention in recent years due to the significant growth of digital media and heightened ethical and security concerns. It has been extensively studied using multimodal methods, including video, audio, and text. In addition, individual differences in deception production and detection are believed to play a crucial role.Although some studies have utilized individual information such as personality traits to enhance the performance of deception detection, current systems remain limited, partly due to a lack of sufficient datasets for evaluating performance. To address this issue, we introduce a multimodal deception dataset MDPE. Besides deception features, this dataset also includes individual differences information in personality and emotional expression characteristics. It can explore the impact of individual differences on deception behavior. It comprises over 104 hours of deception and emotional videos from 193 subjects. Furthermore, we conducted numerous experiments to provide valuable insights for future deception detection research. MDPE not only supports deception detection, but also provides conditions for tasks such as personality recognition and emotion recognition, and can even study the relationships between them. We believe that MDPE will become a valuable resource for promoting research in the field of affective computing.