Autoregressive Generative Modeling with Noise Conditional Maximum Likelihood Estimation
This work addresses the covariate shift issue in autoregressive models for image generation, offering incremental improvements in performance metrics.
The paper tackles the problem of improving autoregressive generative models by introducing a noise conditional maximum likelihood estimation framework, which enhances robustness and sample quality, achieving 3.32 bits per dimension on ImageNet 64x64 and reducing FID from 37.50 to 12.09 on CIFAR-10.
We introduce a simple modification to the standard maximum likelihood estimation (MLE) framework. Rather than maximizing a single unconditional likelihood of the data under the model, we maximize a family of \textit{noise conditional} likelihoods consisting of the data perturbed by a continuum of noise levels. We find that models trained this way are more robust to noise, obtain higher test likelihoods, and generate higher quality images. They can also be sampled from via a novel score-based sampling scheme which combats the classical \textit{covariate shift} problem that occurs during sample generation in autoregressive models. Applying this augmentation to autoregressive image models, we obtain 3.32 bits per dimension on the ImageNet 64x64 dataset, and substantially improve the quality of generated samples in terms of the Frechet Inception distance (FID) -- from 37.50 to 12.09 on the CIFAR-10 dataset.