CLLGFeb 11, 2025

Tractable Transformers for Flexible Conditional Generation

arXiv:2502.07616v21 citationsh-index: 41ICML
Originality Incremental advance
AI Analysis

This addresses a bottleneck in conditional generation for NLP applications, offering a more robust model for tasks like text generation, though it is incremental as it builds on existing Transformer architectures.

The paper tackled the problem of non-autoregressive models underperforming in conditional generation tasks by proposing Tractable Transformers (Tracformer), which incorporate sparse Transformer encoders to capture local and global context, achieving state-of-the-art performance in text modeling compared to diffusion and autoregressive baselines.

Non-autoregressive (NAR) generative models are valuable because they can handle diverse conditional generation tasks in a more principled way than their autoregressive (AR) counterparts, which are constrained by sequential dependency requirements. Recent advancements in NAR models, such as diffusion language models, have demonstrated superior performance in unconditional generation compared to AR models (e.g., GPTs) of similar sizes. However, such improvements do not always lead to improved conditional generation performance. We show that a key reason for this gap is the difficulty in generalizing to conditional probability queries (i.e., the set of unknown variables) unseen during training. As a result, strong unconditional generation performance does not guarantee high-quality conditional generation. This paper proposes Tractable Transformers (Tracformer), a Transformer-based generative model that is more robust to different conditional generation tasks. Unlike existing models that rely solely on global contextual features derived from full inputs, Tracformers incorporate a sparse Transformer encoder to capture both local and global contextual information. This information is routed through a decoder for conditional generation. Empirical results demonstrate that Tracformers achieve state-of-the-art conditional generation performance on text modeling compared to recent diffusion and AR model baselines.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes