LG DSMay 12

The tractability landscape of diffusion alignment: regularization, rewards, and computational primitives

Ankur Moitra, Andrej Risteski, Dhruv Rohatgi

arXiv:2605.1136172.61 citations

Predicted impact top 24% in LG · last 90 daysOriginality Incremental advance

AI Analysis

For researchers in generative AI and alignment, this work clarifies how the computational tractability of reward alignment depends on the chosen closeness constraint, providing a primitive-based framework.

The paper studies inference-time reward alignment for diffusion models, showing that the choice of distributional distance (KL vs. Wasserstein) determines which algorithmic primitives suffice and which reward classes are tractable. For KL, linear exponential tilts enable alignment to convex low-dimensional rewards; for Wasserstein, a proximal transport oracle handles concave or low-dimensional Lipschitz rewards.

Inference-time reward alignment asks how to turn a pre-trained diffusion model with base law $p$ into a sampler that favors a reward $r$ while remaining close to $p$. Since there is no canonical distributional distance for this closeness constraint, different choices lead to different "reward-aligned" laws and, just as importantly, different algorithmic problems. We develop a primitive-based approach to reward alignment: rather than assuming arbitrary reward-aligned laws can be sampled, we ask which simple algorithmic primitives suffice to implement alignment for non-trivial reward classes. If closeness is measured in KL distance, the target law is $q(x) \propto p(x) \exp(λ^{-1}r(x))$. For this setting, we show that linear exponential tilts of the form $q(x)\propto p(x)\exp(\langle θ, x \rangle)$ -- which according to recent work [MRR26] can be efficiently sampled from -- are a sufficient primitive for aligning to a very broad class of convex low-dimensional rewards. If closeness is measured in Wasserstein distance, the corresponding primitive is a proximal transport oracle: given $x$, solve $\mbox{argmax}_y \{r(y)- λc(x,y)\}$. This oracle can be efficiently implemented for concave or low-dimensional Lipschitz rewards $r(x)=f(Ax)$. Together, these results illustrate that the choice of distribution distance for alignment affects the computational primitive and the tractable reward class.

View on arXiv PDF

Similar