CVLGOct 26, 2024

Your Image is Secretly the Last Frame of a Pseudo Video

arXiv:2410.20158v3h-index: 1
Originality Incremental advance
AI Analysis

This work addresses the challenge of enhancing image generation for AI researchers by proposing an incremental method to leverage pseudo videos for self-supervision in generative models.

The paper tackles the problem of improving image generation quality in generative models by hypothesizing that diffusion models benefit from self-supervision via pseudo videos, and it shows that training video generative models on pseudo videos with expressive data augmentation leads to better image quality on CIFAR10 and CelebA datasets.

Diffusion models, which can be viewed as a special case of hierarchical variational autoencoders (HVAEs), have shown profound success in generating photo-realistic images. In contrast, standard HVAEs often produce images of inferior quality compared to diffusion models. In this paper, we hypothesize that the success of diffusion models can be partly attributed to the additional self-supervision information for their intermediate latent states provided by corrupted images, which along with the original image form a pseudo video. Based on this hypothesis, we explore the possibility of improving other types of generative models with such pseudo videos. Specifically, we first extend a given image generative model to their video generative model counterpart, and then train the video generative model on pseudo videos constructed by applying data augmentation to the original images. Furthermore, we analyze the potential issues of first-order Markov data augmentation methods, which are typically used in diffusion models, and propose to use more expressive data augmentation to construct more useful information in pseudo videos. Our empirical results on the CIFAR10 and CelebA datasets demonstrate that improved image generation quality can be achieved with additional self-supervised information from pseudo videos.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes