CLSep 20, 2024
Depression Diagnosis Dialogue Simulation: Self-improving Psychiatrist with Tertiary MemoryKunyao Lan, Bingrui Jin, Zichen Zhu et al.
Mental health issues, particularly depressive disorders, present significant challenges in contemporary society, necessitating the development of effective automated diagnostic methods. This paper introduces the Agent Mental Clinic (AMC), a self-improving conversational agent system designed to enhance depression diagnosis through simulated dialogues between patient and psychiatrist agents. To enhance the dialogue quality and diagnosis accuracy, we design a psychiatrist agent consisting of a tertiary memory structure, a dialogue control and reflect plugin that acts as ``supervisor'' and a memory sampling module, fully leveraging the skills reflected by the psychiatrist agent, achieving great accuracy on depression risk and suicide risk diagnosis via conversation. Experiment results on datasets collected in real-life scenarios demonstrate that the system, simulating the procedure of training psychiatrists, can be a promising optimization method for aligning LLMs with real-life distribution in specific domains without modifying the weights of LLMs, even when only a few representative labeled cases are available.
CLMar 24
Synthetic or Authentic? Building Mental Patient Simulators from Longitudinal EvidenceBaihan Li, Bingrui Jin, Kunyao Lan et al.
Patient simulation is essential for developing and evaluating mental health dialogue systems. As most existing approaches rely on snapshot-style prompts with limited profile information, homogeneous behaviors and incoherent disease progression in multi-turn interactions have become key chellenges. In this work, we propose DEPROFILE, a data-grounded patient simulation framework that constructs unified, multi-source patient profiles by integrating demographic attributes, standardized clinical symptoms, counseling dialogues, and longitudinal life-event histories from real-world data. We further introduce a Chain-of-Change agent to transform noisy longitudinal records into structured, temporally grounded memory representations for simulation. Experiments across multiple large language model (LLM) backbones show that with more comprehensive profile constructed by DEPROFILE, the dialogue realism, behavioral diversity, and event richness have consistently improved and exceed state-of-the-art baselines, highlighting the importance of grounding patient simulation in verifiable longitudinal evidence.
MAOct 17, 2024
MobA: Multifaceted Memory-Enhanced Adaptive Planning for Efficient Mobile Task AutomationZichen Zhu, Hao Tang, Yansi Li et al.
Existing Multimodal Large Language Model (MLLM)-based agents face significant challenges in handling complex GUI (Graphical User Interface) interactions on devices. These challenges arise from the dynamic and structured nature of GUI environments, which integrate text, images, and spatial relationships, as well as the variability in action spaces across different pages and tasks. To address these limitations, we propose MobA, a novel MLLM-based mobile assistant system. MobA introduces an adaptive planning module that incorporates a reflection mechanism for error recovery and dynamically adjusts plans to align with the real environment contexts and action module's execution capacity. Additionally, a multifaceted memory module provides comprehensive memory support to enhance adaptability and efficiency. We also present MobBench, a dataset designed for complex mobile interactions. Experimental results on MobBench and AndroidArena demonstrate MobA's ability to handle dynamic GUI environments and perform complex mobile tasks.
AIDec 25, 2024
AdaEAGLE: Optimizing Speculative Decoding via Explicit Modeling of Adaptive Draft StructuresSituo Zhang, Hankun Wang, Da Ma et al.
Speculative Decoding (SD) is a popular lossless technique for accelerating the inference of Large Language Models (LLMs). We show that the decoding speed of SD frameworks with static draft structures can be significantly improved by incorporating context-aware adaptive draft structures. However, current studies on adaptive draft structures are limited by their performance, modeling approaches, and applicability. In this paper, we introduce AdaEAGLE, the first SD framework that explicitly models adaptive draft structures. AdaEAGLE leverages the Lightweight Draft Length Predictor (LDLP) module to explicitly predict the optimal number of draft tokens during inference to guide the draft model. It achieves comparable speedup results without manual thresholds and allows for deeper, more specialized optimizations. Moreover, together with threshold-based strategies, AdaEAGLE achieves a $1.62\times$ speedup over the vanilla AR decoding and outperforms fixed-length SotA baseline while maintaining output quality.
AIApr 7, 2024
Towards Reliable and Empathetic Depression-Diagnosis-Oriented ChatsKunyao Lan, Cong Ming, Binwei Yao et al.
Chatbots can serve as a viable tool for preliminary depression diagnosis via interactive conversations with potential patients. Nevertheless, the blend of task-oriented and chit-chat in diagnosis-related dialogues necessitates professional expertise and empathy. Such unique requirements challenge traditional dialogue frameworks geared towards single optimization goals. To address this, we propose an innovative ontology definition and generation framework tailored explicitly for depression diagnosis dialogues, combining the reliability of task-oriented conversations with the appeal of empathy-related chit-chat. We further apply the framework to D$^4$, the only existing public dialogue dataset on depression diagnosis-oriented chats. Exhaustive experimental results indicate significant improvements in task completion and emotional support generation in depression diagnosis, fostering a more comprehensive approach to task-oriented chat dialogue system development and its applications in digital mental health.
AIOct 29, 2025
From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric ComorbidityTianxi Wan, Jiaming Luo, Siyuan Chen et al.
Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic patient electronic medical record (EMR) construction and multi-agent diagnostic dialogue generation. We create 502 synthetic EMRs for common comorbid conditions using a pipeline that ensures clinical relevance and diversity. Our multi-agent framework transfers the clinical interview protocol into a hierarchical state machine and context tree, supporting over 130 diagnostic states while maintaining clinical standards. Through this rigorous process, we construct PsyCoTalk, the first large-scale dialogue dataset supporting comorbidity, containing 3,000 multi-turn diagnostic dialogues validated by psychiatrists. This dataset enhances diagnostic accuracy and treatment planning, offering a valuable resource for psychiatric comorbidity research. Compared to real-world clinical transcripts, PsyCoTalk exhibits high structural and linguistic fidelity in terms of dialogue length, token distribution, and diagnostic reasoning strategies. Licensed psychiatrists confirm the realism and diagnostic validity of the dialogues. This dataset enables the development and evaluation of models capable of multi-disorder psychiatric screening in a single conversational pass.
CLMar 9, 2025
Delusions of Large Language ModelsHongshen Xu, Zixv yang, Zichen Zhu et al.
Large Language Models often generate factually incorrect but plausible outputs, known as hallucinations. We identify a more insidious phenomenon, LLM delusion, defined as high belief hallucinations, incorrect outputs with abnormally high confidence, making them harder to detect and mitigate. Unlike ordinary hallucinations, delusions persist with low uncertainty, posing significant challenges to model reliability. Through empirical analysis across different model families and sizes on several Question Answering tasks, we show that delusions are prevalent and distinct from hallucinations. LLMs exhibit lower honesty with delusions, which are harder to override via finetuning or self reflection. We link delusion formation with training dynamics and dataset noise and explore mitigation strategies such as retrieval augmented generation and multi agent debating to mitigate delusions. By systematically investigating the nature, prevalence, and mitigation of LLM delusions, our study provides insights into the underlying causes of this phenomenon and outlines future directions for improving model reliability.
IRMar 3, 2025
A Diverse and Effective Retrieval-Based Debt Collection System with Expert KnowledgeJiaming Luo, Weiyi Luo, Guoqing Sun et al.
Designing effective debt collection systems is crucial for improving operational efficiency and reducing costs in the financial industry. However, the challenges of maintaining script diversity, contextual relevance, and coherence make this task particularly difficult. This paper presents a debt collection system based on real debtor-collector data from a major commercial bank. We construct a script library from real-world debt collection conversations, and propose a two-stage retrieval based response system for contextual relevance. Experimental results show that our system improves script diversity, enhances response relevance, and achieves practical deployment efficiency through knowledge distillation. This work offers a scalable and automated solution, providing valuable insights for advancing debt collection practices in real-world applications.
CLMay 23, 2023
LLM-empowered Chatbots for Psychiatrist and Patient Simulation: Application and EvaluationSiyuan Chen, Mengyue Wu, Kenny Q. Zhu et al.
Empowering chatbots in the field of mental health is receiving increasing amount of attention, while there still lacks exploration in developing and evaluating chatbots in psychiatric outpatient scenarios. In this work, we focus on exploring the potential of ChatGPT in powering chatbots for psychiatrist and patient simulation. We collaborate with psychiatrists to identify objectives and iteratively develop the dialogue system to closely align with real-world scenarios. In the evaluation experiments, we recruit real psychiatrists and patients to engage in diagnostic conversations with the chatbots, collecting their ratings for assessment. Our findings demonstrate the feasibility of using ChatGPT-powered chatbots in psychiatric scenarios and explore the impact of prompt designs on chatbot behavior and user experience.