TrustLDM: Benchmarking Trustworthiness in Language Diffusion Models

Yichuan Mo, Yukun Jiang, Yanbo Shi, Mingjie Li, Michael Backes, Yang Zhang, Yisen Wang

arXiv:2606.0002334.11 citationsh-index: 18Has Code

Predicted impact top 32% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For researchers and developers of language diffusion models, this work provides the first comprehensive trustworthiness benchmark, highlighting new risks from flexible decoding strategies.

The paper introduces TrustLDM, a benchmark for evaluating trustworthiness (safety, privacy, fairness) of language diffusion models (LDMs). It finds that LDMs are generally trustworthy with user prompts alone but degrade when malicious contexts are attached, and proposes an automatic evaluation framework revealing vulnerabilities across models.

The rapid development of Language Diffusion Models (LDMs) challenges the dominant position of auto-regressive competitors in language processing. However, their flexible, any-order decoding strategies not only enable fast decoding speed but also potentially bring new trustworthiness challenges. To better understand the risks behind their pipelines, we introduce a comprehensive trustworthiness benchmark tailored to LDMs (TrustLDM), evaluating safety, privacy, and fairness across different LDM architectures with multiple categories of static post contexts. Our empirical results show that although LDMs generally exhibit strong trustworthiness with only the user prompts, their alignment behavior degrades noticeably when the malicious post contexts are attached to the masked responses. We further observe that longer contexts do not necessarily induce stronger effects, and both decoding order and generation length affect the evaluation outcomes. Finally, we propose TrustLDM-Auto, an automatic evaluation framework that leverages LDM decoding flexibility to systematically identify vulnerable configurations, revealing substantial trustworthiness weaknesses across all evaluated models and dimensions. Our work may potentially help the community build more trustworthy LDMs. Our code is available at https://github.com/PKU-ML/TrustLDM.

View on arXiv PDF Code

Similar