CVAILGDec 19, 2024

Jet: A Modern Transformer-Based Normalizing Flow

arXiv:2412.15129v111 citationsh-index: 36
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing generative models for researchers by proposing a more efficient normalizing flow, though it is incremental as it builds on existing flow methods.

The paper tackles the problem of improving normalizing flow models for natural image generation by revisiting design choices and using Vision Transformer blocks instead of CNNs, achieving state-of-the-art quantitative and qualitative performance with a simpler architecture, though visual quality still lags behind top models.

In the past, normalizing generative flows have emerged as a promising class of generative models for natural images. This type of model has many modeling advantages: the ability to efficiently compute log-likelihood of the input data, fast generation and simple overall structure. Normalizing flows remained a topic of active research but later fell out of favor, as visual quality of the samples was not competitive with other model classes, such as GANs, VQ-VAE-based approaches or diffusion models. In this paper we revisit the design of the coupling-based normalizing flow models by carefully ablating prior design choices and using computational blocks based on the Vision Transformer architecture, not convolutional neural networks. As a result, we achieve state-of-the-art quantitative and qualitative performance with a much simpler architecture. While the overall visual quality is still behind the current state-of-the-art models, we argue that strong normalizing flow models can help advancing research frontier by serving as building components of more powerful generative models.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes