Panoptic-based Image Synthesis
This addresses the challenge of generating photorealistic images for content editing and generation in complex scenes with multiple instances, representing an incremental advancement over existing semantic map-based methods.
The paper tackles the problem of conditional image synthesis in complex environments with occluding instances by proposing a panoptic-aware network that uses panoptic maps to unify semantic and instance information. The result shows improvements over previous state-of-the-art methods in generating higher fidelity images in complex interactions and better details for tiny objects, with gains in mean IoU and detAP metrics.
Conditional image synthesis for generating photorealistic images serves various applications for content editing to content generation. Previous conditional image synthesis algorithms mostly rely on semantic maps, and often fail in complex environments where multiple instances occlude each other. We propose a panoptic aware image synthesis network to generate high fidelity and photorealistic images conditioned on panoptic maps which unify semantic and instance information. To achieve this, we efficiently use panoptic maps in convolution and upsampling layers. We show that with the proposed changes to the generator, we can improve on the previous state-of-the-art methods by generating images in complex instance interaction environments in higher fidelity and tiny objects in more details. Furthermore, our proposed method also outperforms the previous state-of-the-art methods in metrics of mean IoU (Intersection over Union), and detAP (Detection Average Precision).