CLDec 17, 2024Code
Can Large Language Models Understand You Better? An MBTI Personality Detection Dataset Aligned with Population TraitsBohan Li, Jiannan Guan, Longxu Dou et al.
The Myers-Briggs Type Indicator (MBTI) is one of the most influential personality theories reflecting individual differences in thinking, feeling, and behaving. MBTI personality detection has garnered considerable research interest and has evolved significantly over the years. However, this task tends to be overly optimistic, as it currently does not align well with the natural distribution of population personality traits. Specifically, (1) the self-reported labels in existing datasets result in incorrect labeling issues, and (2) the hard labels fail to capture the full range of population personality distributions. In this paper, we optimize the task by constructing MBTIBench, the first manually annotated high-quality MBTI personality detection dataset with soft labels, under the guidance of psychologists. As for the first challenge, MBTIBench effectively solves the incorrect labeling issues, which account for 29.58% of the data. As for the second challenge, we estimate soft labels by deriving the polarity tendency of samples. The obtained soft labels confirm that there are more people with non-extreme personality traits. Experimental results not only highlight the polarized predictions and biases in LLMs as key directions for future research, but also confirm that soft labels can provide more benefits to other psychological tasks than hard labels. The code and data are available at https://github.com/Personality-NLP/MbtiBench.
CLNov 12, 2025
CARE-Bench: A Benchmark of Diverse Client Simulations Guided by Expert Principles for Evaluating LLMs in Psychological CounselingBichen Wang, Yixin Sun, Junzhe Wang et al.
The mismatch between the growing demand for psychological counseling and the limited availability of services has motivated research into the application of Large Language Models (LLMs) in this domain. Consequently, there is a need for a robust and unified benchmark to assess the counseling competence of various LLMs. Existing works, however, are limited by unprofessional client simulation, static question-and-answer evaluation formats, and unidimensional metrics. These limitations hinder their effectiveness in assessing a model's comprehensive ability to handle diverse and complex clients. To address this gap, we introduce \textbf{CARE-Bench}, a dynamic and interactive automated benchmark. It is built upon diverse client profiles derived from real-world counseling cases and simulated according to expert guidelines. CARE-Bench provides a multidimensional performance evaluation grounded in established psychological scales. Using CARE-Bench, we evaluate several general-purpose LLMs and specialized counseling models, revealing their current limitations. In collaboration with psychologists, we conduct a detailed analysis of the reasons for LLMs' failures when interacting with clients of different types, which provides directions for developing more comprehensive, universal, and effective counseling models.
CVMar 6
Text-Driven Emotionally Continuous Talking Face GenerationHao Yang, Yanyan Zhao, Tian Zheng et al.
Talking Face Generation (TFG) strives to create realistic and emotionally expressive digital faces. While previous TFG works have mastered the creation of naturalistic facial movements, they typically express a fixed target emotion in synthetic videos and lack the ability to exhibit continuously changing and natural expressions like humans do when conveying information. To synthesize realistic videos, we propose a novel task called Emotionally Continuous Talking Face Generation (EC-TFG), which takes a text segment and an emotion description with varying emotions as driving data, aiming to generate a video where the person speaks the text while reflecting the emotional changes within the description. Alongside this, we introduce a customized model, i.e., Temporal-Intensive Emotion Modulated Talking Face Generation (TIE-TFG), which innovatively manages dynamic emotional variations by employing Temporal-Intensive Emotion Fluctuation Modeling, allowing it to provide emotion variation sequences corresponding to the input text to drive continuous facial expression changes in synthesized videos. Extensive evaluations demonstrate our method's exceptional ability to produce smooth emotion transitions and uphold high-quality visuals and motion authenticity across diverse emotional states.
CLJun 4, 2024
RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language ModelsBichen Wang, Yuzhe Zi, Yixin Sun et al.
With the passage of the Right to Be Forgotten (RTBF) regulations and the scaling up of language model training datasets, research on model unlearning in large language models (LLMs) has become more crucial. Before the era of LLMs, machine unlearning research focused mainly on classification tasks in models with small parameters. In these tasks, the content to be forgotten or retained is clear and straightforward. However, as parameter sizes have grown and tasks have become more complex, balancing forget quality and model utility has become more challenging, especially in scenarios involving personal data instead of classification results. Existing methods based on gradient ascent and its variants often struggle with this balance, leading to unintended information loss or partial forgetting. To address this challenge, we propose RKLD, a novel \textbf{R}everse \textbf{KL}-Divergence-based Knowledge \textbf{D}istillation unlearning algorithm for LLMs targeting the unlearning of personal information. Through RKLD, we achieve significant forget quality and effectively maintain the model utility in our experiments.