Categorical SDEs with Simplex Diffusion
This provides a theoretical foundation for diffusing categorical data, potentially benefiting applications like text generation or reinforcement learning, but it is incremental as it adapts existing diffusion frameworks.
The paper tackles the problem of applying diffusion models to categorical-valued data by proposing Simplex Diffusion, which diffuses data on an n-dimensional probability simplex using a multi-dimensional Cox-Ingersoll-Ross process, relating it to the Dirichlet distribution.
Diffusion models typically operate in the standard framework of generative modelling by producing continuously-valued datapoints. To this end, they rely on a progressive Gaussian smoothing of the original data distribution, which admits an SDE interpretation involving increments of a standard Brownian motion. However, some applications such as text generation or reinforcement learning might naturally be better served by diffusing categorical-valued data, i.e., lifting the diffusion to a space of probability distributions. To this end, this short theoretical note proposes Simplex Diffusion, a means to directly diffuse datapoints located on an n-dimensional probability simplex. We show how this relates to the Dirichlet distribution on the simplex and how the analogous SDE is realized thanks to a multi-dimensional Cox-Ingersoll-Ross process (abbreviated as CIR), previously used in economics and mathematical finance. Finally, we make remarks as to the numerical implementation of trajectories of the CIR process, and discuss some limitations of our approach.