LGMar 21

Generating from Discrete Distributions Using Diffusions: Insights from Random Constraint Satisfaction Problems

arXiv:2603.2058966.31 citationsh-index: 2
AI Analysis

This work provides insights for researchers developing generative techniques for discrete data, though it is incremental as it builds on existing benchmarks and methods.

The paper tackled the problem of generating data from discrete distributions using random constraint satisfaction problems as a benchmark, finding that continuous diffusions outperform masked discrete ones, learned diffusions can achieve ideal accuracy, and smart variable ordering improves accuracy.

Generating data from discrete distributions is important for a number of application domains including text, tabular data, and genomic data. Several groups have recently used random $k$-satisfiability ($k$-SAT) as a synthetic benchmark for new generative techniques. In this paper, we show that fundamental insights from the theory of random constraint satisfaction problems have observable implications (sometime contradicting intuition) on the behavior of generative techniques on such benchmarks. More precisely, we study the problem of generating a uniformly random solution of a given (random) $k$-SAT or $k$-XORSAT formula. Among other findings, we observe that: $(i)$~Continuous diffusions outperform masked discrete diffusions; $(ii)$~Learned diffusions can match the theoretical `ideal' accuracy; $(iii)$~Smart ordering of the variables can significantly improve accuracy, although not following popular heuristics.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes