CLJun 16, 2023
ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive EvaluationGuangyu Wang, Guoxing Yang, Zongxin Du et al.
Large language models have exhibited exceptional performance on various Natural Language Processing (NLP) tasks, leveraging techniques such as the pre-training, and instruction fine-tuning. Despite these advances, their effectiveness in medical applications is limited, due to challenges such as factual inaccuracies, reasoning abilities, and lack grounding in real-world experience. In this study, we present ClinicalGPT, a language model explicitly designed and optimized for clinical scenarios. By incorporating extensive and diverse real-world data, such as medical records, domain-specific knowledge, and multi-round dialogue consultations in the training process, ClinicalGPT is better prepared to handle multiple clinical task. Furthermore, we introduce a comprehensive evaluation framework that includes medical knowledge question-answering, medical exams, patient consultations, and diagnostic analysis of medical records. Our results demonstrate that ClinicalGPT significantly outperforms other models in these tasks, highlighting the effectiveness of our approach in adapting large language models to the critical domain of healthcare.
CRJul 8, 2025Code
CAVGAN: Unifying Jailbreak and Defense of LLMs via Generative Adversarial Attacks on their Internal RepresentationsXiaohu Li, Yunfeng Ning, Zepeng Bao et al.
Security alignment enables the Large Language Model (LLM) to gain the protection against malicious queries, but various jailbreak attack methods reveal the vulnerability of this security mechanism. Previous studies have isolated LLM jailbreak attacks and defenses. We analyze the security protection mechanism of the LLM, and propose a framework that combines attack and defense. Our method is based on the linearly separable property of LLM intermediate layer embedding, as well as the essence of jailbreak attack, which aims to embed harmful problems and transfer them to the safe area. We utilize generative adversarial network (GAN) to learn the security judgment boundary inside the LLM to achieve efficient jailbreak attack and defense. The experimental results indicate that our method achieves an average jailbreak success rate of 88.85\% across three popular LLMs, while the defense success rate on the state-of-the-art jailbreak dataset reaches an average of 84.17\%. This not only validates the effectiveness of our approach but also sheds light on the internal security mechanisms of LLMs, offering new insights for enhancing model security The code and data are available at https://github.com/NLPGM/CAVGAN.
AIAug 18, 2025Code
FuSaR: A Fuzzification-Based Method for LRM Safety-Reasoning BalanceJianhao Chen, Mayi Xu, Xiaohu Li et al.
Large Reasoning Models (LRMs) have demonstrated impressive performance across various tasks due to their powerful reasoning capabilities. However, their safety performance remains a significant concern. In this paper, we explore the reasons behind the vulnerability of LRMs. Based on this, we propose a novel method to improve the safety of LLMs without sacrificing their reasoning capability. Specifically, we exploit the competition between LRM's reasoning ability and safety ability, and achieve jailbreak by improving LRM's reasoning performance to reduce its safety performance. We then introduce an alignment strategy based on Fuzzification to balance Safety-Reasoning (FuSaR), by detoxifying the harmful reasoning process, where both the dangerous entities and the dangerous procedures in the reasoning steps are hidden. FuSaR successfully mitigates safety risks while preserving core reasoning information. We validate this strategy through alignment experiments on several open-source LRMs using detoxified reasoning data. The results compared with existing baselines conclusively show that FuSaR is an efficient alignment strategy to simultaneously enhance both the reasoning capability and safety of LRMs.
AIJun 1, 2025Code
Aligning VLM Assistants with Personalized Situated CognitionYongqi Li, Shen Zhou, Xiaohu Li et al.
Vision-language models (VLMs) aligned with general human objectives, such as being harmless and hallucination-free, have become valuable assistants of humans in managing visual tasks. However, people with diversified backgrounds have different cognition even in the same situation. Consequently, they may have personalized expectations for VLM assistants. This highlights the urgent need to align VLM assistants with personalized situated cognition for real-world assistance. To study this problem, we first simplify it by characterizing individuals based on the sociological concept of Role-Set. Then, we propose to evaluate the individuals' actions to examine whether the personalized alignment is achieved. Further, we construct a benchmark named PCogAlignBench, which includes 18k instances and 20 individuals with different Role-Sets. Finally, we present a framework called PCogAlign, which constructs a cognition-aware and action-based reward model for personalized alignment. Experimental results and human evaluations demonstrate the reliability of the PCogAlignBench and the effectiveness of our proposed PCogAlign. We will open-source the constructed benchmark and code at https://github.com/NLPGM/PCogAlign.
LGMay 17, 2019
Stochastically Dominant Distributional Reinforcement LearningJohn D. Martin, Michal Lyskawinski, Xiaohu Li et al.
We describe a new approach for managing aleatoric uncertainty in the Reinforcement Learning (RL) paradigm. Instead of selecting actions according to a single statistic, we propose a distributional method based on the second-order stochastic dominance (SSD) relation. This compares the inherent dispersion of random returns induced by actions, producing a more comprehensive and robust evaluation of the environment's uncertainty. The necessary conditions for SSD require estimators to predict accurate second moments. To accommodate this, we map the distributional RL problem to a Wasserstein gradient flow, treating the distributional Bellman residual as a potential energy functional. We propose a particle-based algorithm for which we prove optimality and convergence. Our experiments characterize the algorithm performance and demonstrate how uncertainty and performance are better balanced using an \textsc{ssd} policy than with other risk measures.
CRMar 28, 2016
A Stochastic Model for Quantitative Security Analyses of Networked SystemsXiaohu Li, Paul Parker, Shouhuai Xu
Traditional security analyses are often geared towards cryptographic primitives or protocols. Although such analyses are necessary, they cannot address a defender's need for insight into {\em which aspects of a networked system having a significant impact on its security, and how to tune its configurations or parameters so as to improve security}. This question is known to be notoriously difficult to answer, and the state-of-the-art is that we know little about it. Towards ultimately addressing this question, this paper presents a stochastic model for quantifying security of networked systems. The resulting model captures two aspects of a networked system: (1) the strength of deployed security mechanisms such as intrusion detection systems, and (2) the underlying {\em vulnerability graph}, which reflects how attacks may proceed. The resulting model brings the following insights: (1) How should a defender "tune" system configurations (e.g., network topology) so as to improve security? (2) How should a defender "tune" system parameters (e.g., by upgrading which security mechanisms) so as to improve security? (3) Under what conditions is the steady-state number of compromised entities of interest below a given threshold with a high probability? Simulation studies are conducted to confirm the analytic results, and to show the tightness of the bounds of certain important metric that cannot be resolved analytically.