CVJun 15, 2023

Diffusion Models for Open-Vocabulary Segmentation

Laurynas Karazija, Iro Laina, Andrea Vedaldi, Christian Rupprecht

arXiv:2306.09316v226.169 citationsh-index: 105

Originality Highly original

AI Analysis

This addresses the need for efficient segmentation algorithms in open-vocabulary settings without costly training efforts, offering a practical solution for computer vision applications.

The paper tackles the problem of open-vocabulary segmentation by proposing OVDiff, a method that uses pre-trained text-to-image diffusion models to synthesize support image sets for arbitrary categories, enabling segmentation without additional data, annotations, or training. It achieves a lead of over 5% on PASCAL VOC compared to prior work.

Open-vocabulary segmentation is the task of segmenting anything that can be named in an image. Recently, large-scale vision-language modelling has led to significant advances in open-vocabulary segmentation, but at the cost of gargantuan and increasing training and annotation efforts. Hence, we ask if it is possible to use existing foundation models to synthesise on-demand efficient segmentation algorithms for specific class sets, making them applicable in an open-vocabulary setting without the need to collect further data, annotations or perform training. To that end, we present OVDiff, a novel method that leverages generative text-to-image diffusion models for unsupervised open-vocabulary segmentation. OVDiff synthesises support image sets for arbitrary textual categories, creating for each a set of prototypes representative of both the category and its surrounding context (background). It relies solely on pre-trained components and outputs the synthesised segmenter directly, without training. Our approach shows strong performance on a range of benchmarks, obtaining a lead of more than 5% over prior work on PASCAL VOC.

View on arXiv PDF

Similar