LGGNAug 24, 2025

ShortListing Model: A Streamlined SimplexDiffusion for Discrete Variable Generation

arXiv:2508.17345v13 citationsh-index: 11Has Code
Originality Incremental advance
AI Analysis

This work addresses a crucial problem in natural language processing and biological sequence design by offering a more efficient method for discrete variable generation, though it appears incremental as it builds on existing diffusion models.

The paper tackles the challenge of generative modeling for discrete variables by introducing the ShortListing Model (SLM), a simplex-based diffusion model that reduces complexity and enhances scalability, achieving competitive performance in applications like DNA promoter and enhancer design, protein design, and language modeling.

Generative modeling of discrete variables is challenging yet crucial for applications in natural language processing and biological sequence design. We introduce the Shortlisting Model (SLM), a novel simplex-based diffusion model inspired by progressive candidate pruning. SLM operates on simplex centroids, reducing generation complexity and enhancing scalability. Additionally, SLM incorporates a flexible implementation of classifier-free guidance, enhancing unconditional generation performance. Extensive experiments on DNA promoter and enhancer design, protein design, character-level and large-vocabulary language modeling demonstrate the competitive performance and strong potential of SLM. Our code can be found at https://github.com/GenSI-THUAIR/SLM

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes