CVJul 20, 2020

Cross-View Image Synthesis with Deformable Convolution and Attention Mechanism

arXiv:2007.09858v18 citations
AI Analysis

This addresses the problem of generating realistic images from different viewpoints for computer vision applications, but it appears incremental as it builds on existing GAN and attention techniques.

The paper tackles cross-view image synthesis, where generating images from very different views is challenging due to limited overlap or occlusions, by proposing a GAN-based method using deformable convolution and attention mechanisms, achieving better results than state-of-the-art methods on the Dayton dataset.

Learning to generate natural scenes has always been a daunting task in computer vision. This is even more laborious when generating images with very different views. When the views are very different, the view fields have little overlap or objects are occluded, leading the task very challenging. In this paper, we propose to use Generative Adversarial Networks(GANs) based on a deformable convolution and attention mechanism to solve the problem of cross-view image synthesis (see Fig.1). It is difficult to understand and transform scenes appearance and semantic information from another view, thus we use deformed convolution in the U-net network to improve the network's ability to extract features of objects at different scales. Moreover, to better learn the correspondence between images from different views, we apply an attention mechanism to refine the intermediate feature map thus generating more realistic images. A large number of experiments on different size images on the Dayton dataset[1] show that our model can produce better results than state-of-the-art methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes