CVSep 4, 2023

Exploring Limits of Diffusion-Synthetic Training with Weakly Supervised Semantic Segmentation

arXiv:2309.01369v25 citations
Originality Incremental advance
AI Analysis

This work addresses the problem of improving weakly supervised semantic segmentation for researchers and practitioners by offering incremental enhancements to existing diffusion-synthetic methods.

The paper tackled limitations in diffusion-synthetic training for semantic segmentation by introducing reliability-aware robust training, prompt augmentation, and LoRA-based adaptation, resulting in effectively closing the gap between real and synthetic training on datasets like PASCAL VOC, ImageNet-S, and Cityscapes.

The advance of generative models for images has inspired various training techniques for image recognition utilizing synthetic images. In semantic segmentation, one promising approach is extracting pseudo-masks from attention maps in text-to-image diffusion models, which enables real-image-and-annotation-free training. However, the pioneering training method using the diffusion-synthetic images and pseudo-masks, i.e., DiffuMask has limitations in terms of mask quality, scalability, and ranges of applicable domains. To overcome these limitations, this work introduces three techniques for diffusion-synthetic semantic segmentation training. First, reliability-aware robust training, originally used in weakly supervised learning, helps segmentation with insufficient synthetic mask quality. %Second, large-scale pretraining of whole segmentation models, not only backbones, on synthetic ImageNet-1k-class images with pixel-labels benefits downstream segmentation tasks. Second, we introduce prompt augmentation, data augmentation to the prompt text set to scale up and diversify training images with a limited text resources. Finally, LoRA-based adaptation of Stable Diffusion enables the transfer to a distant domain, e.g., auto-driving images. Experiments in PASCAL VOC, ImageNet-S, and Cityscapes show that our method effectively closes gap between real and synthetic training in semantic segmentation.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes