CLApr 5

DARE: Diffusion Large Language Models Alignment and Reinforcement Executor

arXiv:2604.0421598.62 citationsHas Code
Predicted impact top 2% in CL · last 90 daysOriginality Synthesis-oriented
AI Analysis

This work addresses a practical engineering problem for researchers and developers working with dLLMs by reducing fragmentation and enabling fair comparisons, though it is incremental as it builds on existing tools like verl and OpenCompass.

The paper tackles the fragmentation in post-training pipelines for diffusion large language models (dLLMs) by introducing DARE, an open framework that unifies supervised fine-tuning, preference optimization, and reinforcement learning, resulting in a reusable substrate for reproducible benchmark evaluation and practical acceleration across model families like LLaDA and Dream.

Diffusion large language models (dLLMs) are emerging as a compelling alternative to dominant autoregressive models, replacing strictly sequential token generation with iterative denoising and parallel generation dynamics. However, their open-source ecosystem remains fragmented across model families and, in particular, across post-training pipelines, where reinforcement learning objectives, rollout implementations and evaluation scripts are often released as paper-specific codebases. This fragmentation slows research iteration, raises the engineering burden of reproduction, and makes fair comparison across algorithms difficult. We present \textbf{DARE} (\textbf{d}LLMs \textbf{A}lignment and \textbf{R}einforcement \textbf{E}xecutor), an open framework for post-training and evaluating dLLMs. Built on top of verl~\cite{sheng2024hybridflow} and OpenCompass~\cite{2023opencompass}, DARE unifies supervised fine-tuning, parameter-efficient fine-tuning, preference optimization, and dLLM-specific reinforcement learning under a shared execution stack for both masked and block diffusion language models. Across representative model families including LLaDA, Dream, SDAR, and LLaDA2.x, DARE provides broad algorithmic coverage, reproducible benchmark evaluation, and practical acceleration. Extensive empirical results position that DARE serves as a reusable research substrate for developing, comparing, and deploying post-training methods for current and emerging dLLMs.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes