CVSep 25, 2023

Dataset Diffusion: Diffusion-based Synthetic Dataset Generation for Pixel-Level Semantic Segmentation

arXiv:2309.14303v4159 citationsh-index: 26Has Code
Originality Incremental advance
AI Analysis

This addresses the problem of high annotation costs for semantic segmentation in computer vision, offering a synthetic data generation solution that is incremental but improves upon existing methods.

The paper tackles the labor-intensive task of preparing pixel-level semantic segmentation training data by proposing a novel method to generate synthetic images and corresponding segmentation maps using Stable Diffusion, eliminating the need for manual annotation. The approach outperforms concurrent work on PASCAL VOC and MSCOCO datasets.

Preparing training data for deep vision models is a labor-intensive task. To address this, generative models have emerged as an effective solution for generating synthetic data. While current generative models produce image-level category labels, we propose a novel method for generating pixel-level semantic segmentation labels using the text-to-image generative model Stable Diffusion (SD). By utilizing the text prompts, cross-attention, and self-attention of SD, we introduce three new techniques: class-prompt appending, class-prompt cross-attention, and self-attention exponentiation. These techniques enable us to generate segmentation maps corresponding to synthetic images. These maps serve as pseudo-labels for training semantic segmenters, eliminating the need for labor-intensive pixel-wise annotation. To account for the imperfections in our pseudo-labels, we incorporate uncertainty regions into the segmentation, allowing us to disregard loss from those regions. We conduct evaluations on two datasets, PASCAL VOC and MSCOCO, and our approach significantly outperforms concurrent work. Our benchmarks and code will be released at https://github.com/VinAIResearch/Dataset-Diffusion

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes