CLFeb 19, 2025

TESS 2: A Large-Scale Generalist Diffusion Language Model

UW
arXiv:2502.13917v223 citationsh-index: 16Has CodeACL
Originality Incremental advance
AI Analysis

This addresses the need for more controllable and efficient language models for AI applications, though it appears incremental as it builds on existing diffusion and autoregressive methods.

The paper tackles the problem of developing a general instruction-following diffusion language model, resulting in TESS 2, which outperforms contemporary diffusion models and matches or exceeds strong autoregressive models.

We introduce TESS 2, a general instruction-following diffusion language model that outperforms contemporary instruction-tuned diffusion models, as well as matches and sometimes exceeds strong autoregressive (AR) models. We train TESS 2 by first adapting a strong AR model via continued pretraining with the usual cross-entropy as diffusion loss, and then performing further instruction tuning. We find that adaptation training as well as the choice of the base model is crucial for training good instruction-following diffusion models. We further propose reward guidance, a novel and modular inference-time guidance procedure to align model outputs without needing to train the underlying model. Finally, we show that TESS 2 further improves with increased inference-time compute, highlighting the utility of diffusion LMs in having fine-grained controllability over the amount of compute used at inference time. Code and models are available at https://github.com/hamishivi/tess-2.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes