IVCVSep 15, 2023

Increasing diversity of omni-directional images generated from single image using cGAN based on MLPMixer

arXiv:2309.08129v11 citationsh-index: 13
Originality Incremental advance
AI Analysis

This work improves image generation for applications like virtual reality by incrementally enhancing diversity and efficiency in a domain-specific task.

This paper tackles the problem of generating diverse omni-directional images from a single snapshot by addressing limitations of CNN-based GANs, such as poor information propagation to edges and high memory usage, using an MLPMixer-based cGAN, resulting in competitive performance with reduced computational costs and increased diversity.

This paper proposes a novel approach to generating omni-directional images from a single snapshot picture. The previous method has relied on the generative adversarial networks based on convolutional neural networks (CNN). Although this method has successfully generated omni-directional images, CNN has two drawbacks for this task. First, since a convolutional layer only processes a local area, it is difficult to propagate the information of an input snapshot picture embedded in the center of the omni-directional image to the edges of the image. Thus, the omni-directional images created by the CNN-based generator tend to have less diversity at the edges of the generated images, creating similar scene images. Second, the CNN-based model requires large video memory in graphics processing units due to the nature of the deep structure in CNN since shallow-layer networks only receives signals from a limited range of the receptive field. To solve these problems, MLPMixer-based method was proposed in this paper. The MLPMixer has been proposed as an alternative to the self-attention in the transformer, which captures long-range dependencies and contextual information. This enables to propagate information efficiently in the omni-directional image generation task. As a result, competitive performance has been achieved with reduced memory consumption and computational cost, in addition to increasing diversity of the generated omni-directional images.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes