94.7MLMay 7
Spherical Flows for Sampling Categorical DataJannis Chemseddine, Gregor Kornhardt, Gabriele Steidl
We study the problem of learning generative models for discrete sequences in a continuous embedding space. Whereas prior approaches typically operate in Euclidean space or on the probability simplex, we instead work on the sphere $\mathbb S^{d-1}$. There the von Mises-Fisher (vMF) distribution induces a natural noise process and admits a closed-form conditional score. The conditional velocity is in general intractable. Exploiting the radial symmetry of the vMF density we reduce the continuity equation on $\mathbb S^{d-1}$ to a scalar ODE in the cosine similarity, whose unique bounded solution determines the velocity. The marginal velocity and marginal score on $(\mathbb S^{d-1})^L$ both decompose into posterior-weighted tangent sums that differ only by per-token scalar weights. This gives access to both ODE and predictor-corrector (PC) sampling. The posterior is the only learned object, trained by a cross-entropy loss. Experiments compare the vMF path against geodesic and Euclidean alternatives. The combination of vMF and PC sampling significantly improves results on Sudoku and language modeling.
LGNov 30, 2022
Universal Neural Optimal TransportJonathan Geuter, Gregor Kornhardt, Ingimar Tomasson et al.
Optimal Transport (OT) problems are a cornerstone of many applications, but solving them is computationally expensive. To address this problem, we propose UNOT (Universal Neural Optimal Transport), a novel framework capable of accurately predicting (entropic) OT distances and plans between discrete measures for a given cost function. UNOT builds on Fourier Neural Operators, a universal class of neural networks that map between function spaces and that are discretization-invariant, which enables our network to process measures of variable resolutions. The network is trained adversarially using a second, generating network and a self-supervised bootstrapping loss. We ground UNOT in an extensive theoretical framework. Through experiments on Euclidean and non-Euclidean domains, we show that our network not only accurately predicts OT distances and plans across a wide range of datasets, but also captures the geometry of the Wasserstein space correctly. Furthermore, we show that our network can be used as a state-of-the-art initialization for the Sinkhorn algorithm with speedups of up to $7.4\times$, significantly outperforming existing approaches.
86.1LGMar 17
Self-Aware Markov Models for Discrete ReasoningGregor Kornhardt, Jannis Chemseddine, Christian Wald et al.
Standard masked discrete diffusion models face limitations in reasoning tasks due to their inability to correct their own mistakes on the masking path. Since they rely on a fixed number of denoising steps, they are unable to adjust their computation to the complexity of a given problem. To address these limitations, we introduce a method based on learning a Markov transition kernel that is trained on its own outputs. This design enables tokens to be remasked, allowing the model to correct its previous mistakes. Furthermore, we do not need a fixed time schedule but use a trained stopping criterion. This allows for adaptation of the number of function evaluations to the difficulty of the reasoning problem. Our adaptation adds two lightweight prediction heads, enabling reuse and fine-tuning of existing pretrained models. On the Sudoku-Extreme dataset we clearly outperform other flow based methods with a validity of 95%. For the Countdown-4 we only need in average of 10 steps to solve almost 96% of them correctly, while many problems can be solved already in 2 steps.
LGDec 5, 2025
RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMsJonathan Geuter, Gregor Kornhardt
Best-of-$n$ is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of-$n$ relies on a single model to generate responses. We propose RoBoN (Routed Online Best-of-$n$), a sequential multi-LLM alternative to the prevailing single-model best-of-$n$. Given a suite of models $\{m_i\}_{i=1}^M$, RoBoN sequentially routes generations one-by-one across models, based on scores computed using a reward model and an agreement signal on the predicted responses. This online routing requires no additional training, keeps compute parity, and works with any plug-in reward model. Across reasoning benchmarks (MATH500, OlympiadBench, MinervaMath, GSM8K, MMLU), RoBoN consistently outperforms standard best-of-$n$ applied to each individual model for larger $n$, with gains of up to 3.4\% in absolute accuracy, and also improves over a uniform multi-model portfolio baseline. Our results indicate that diversity across models can be exploited at inference to improve best-of-$n$ performance over any constituent model alone, providing a simple, training-free path to test-time scaling with multiple LLMs.
MLOct 14, 2025
Adapting Noise to Data: Generative Flows from 1D ProcessesJannis Chemseddine, Gregor Kornhardt, Richard Duong et al.
We introduce a general framework for constructing generative models using one-dimensional noising processes. Beyond diffusion processes, we outline examples that demonstrate the flexibility of our approach. Motivated by this, we propose a novel framework in which the 1D processes themselves are learnable, achieved by parameterizing the noise distribution through quantile functions that adapt to the data. Our construction integrates seamlessly with standard objectives, including Flow Matching and consistency models. Learning quantile-based noise naturally captures heavy tails and compact supports when present. Numerical experiments highlight both the flexibility and the effectiveness of our method.
LGSep 2, 2021
Solving Inverse Problems with Conditional-GAN Prior via Fast Network-Projected Gradient DescentMuhammad Fadli Damara, Gregor Kornhardt, Peter Jung
The projected gradient descent (PGD) method has shown to be effective in recovering compressed signals described in a data-driven way by a generative model, i.e., a generator which has learned the data distribution. Further reconstruction improvements for such inverse problems can be achieved by conditioning the generator on the measurement. The boundary equilibrium generative adversarial network (BEGAN) implements an equilibrium based loss function and an auto-encoding discriminator to better balance the performance of the generator and the discriminator. In this work we investigate a network-based projected gradient descent (NPGD) algorithm for measurement-conditional generative models to solve the inverse problem much faster than regular PGD. We combine the NPGD with conditional GAN/BEGAN to evaluate their effectiveness in solving compressed sensing type problems. Our experiments on the MNIST and CelebA datasets show that the combination of measurement conditional model with NPGD works well in recovering the compressed signal while achieving similar or in some cases even better performance along with a much faster reconstruction. The achieved reconstruction speed-up in our experiments is up to 140-175.