MLCLLGFeb 10, 2021

Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions

arXiv:2102.05379v3670 citations
Originality Incremental advance
AI Analysis

This addresses the challenge of modeling categorical distributions for applications in language and computer vision, representing an incremental advancement by adapting existing generative frameworks to a new data type.

The paper tackles the problem of generating categorical data like text and image segmentation by introducing Argmax Flows and Multinomial Diffusion, which extend flows and diffusion models from ordinal to categorical domains. The results show that these methods outperform existing dequantization approaches in log-likelihood on text and image segmentation tasks.

Generative flows and diffusion models have been predominantly trained on ordinal data, for example natural images. This paper introduces two extensions of flows and diffusion for categorical data such as language or image segmentation: Argmax Flows and Multinomial Diffusion. Argmax Flows are defined by a composition of a continuous distribution (such as a normalizing flow), and an argmax function. To optimize this model, we learn a probabilistic inverse for the argmax that lifts the categorical data to a continuous space. Multinomial Diffusion gradually adds categorical noise in a diffusion process, for which the generative denoising process is learned. We demonstrate that our method outperforms existing dequantization approaches on text modelling and modelling on image segmentation maps in log-likelihood.

Code Implementations4 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes