CLOct 15, 2025

DSCD: Large Language Model Detoxification with Self-Constrained Decoding

Ming Dong, Jinkui Zhang, Bolong Zheng, Xinhui Tu, Po Hu, Tingting He

arXiv:2510.13183v13 citationsh-index: 12Has CodeEMNLP

Originality Incremental advance

AI Analysis

This addresses the challenge of making LLMs safer for deployment by reducing toxic outputs, though it is incremental as it builds on existing detoxification methods.

The paper tackles the problem of detoxifying large language models (LLMs) by proposing DSCD, a method that modifies token distributions during decoding without fine-tuning, achieving state-of-the-art performance in reducing toxicity and maintaining fluency with improved efficiency.

Detoxification in large language models (LLMs) remains a significant research challenge. Existing decoding detoxification methods are all based on external constraints, which require additional resource overhead and lose generation fluency. This work proposes Detoxification with Self-Constrained Decoding (DSCD), a novel method for LLM detoxification without parameter fine-tuning. DSCD strengthens the inner next-token distribution of the safety layer while weakening that of hallucination and toxic layers during output generation. This effectively diminishes toxicity and enhances output safety. DSCD offers lightweight, high compatibility, and plug-and-play capabilities, readily integrating with existing detoxification methods for further performance improvement. Extensive experiments on representative open-source LLMs and public datasets validate DSCD's effectiveness, demonstrating state-of-the-art (SOTA) performance in both detoxification and generation fluency, with superior efficiency compared to existing methods. These results highlight DSCD's potential as a practical and scalable solution for safer LLM deployments.

View on arXiv PDF

Similar