CVMMApr 24, 2025

DIVE: Inverting Conditional Diffusion Models for Discriminative Tasks

arXiv:2504.17253v12 citationsh-index: 12Has CodeIEEE transactions on multimedia
Originality Incremental advance
AI Analysis

This work addresses the challenge of repurposing generative models for discriminative tasks, offering a more efficient approach for object detection, though it is incremental as it builds on existing diffusion model frameworks.

The paper tackles the problem of using pretrained diffusion models for discriminative tasks, specifically extending their capability from classification to object detection by inverting a layout-to-image diffusion model, achieving performance on par with basic discriminative baselines on the COCO dataset and greatly speeding up previous diffusion-based classification methods without accuracy loss.

Diffusion models have shown remarkable progress in various generative tasks such as image and video generation. This paper studies the problem of leveraging pretrained diffusion models for performing discriminative tasks. Specifically, we extend the discriminative capability of pretrained frozen generative diffusion models from the classification task to the more complex object detection task, by "inverting" a pretrained layout-to-image diffusion model. To this end, a gradient-based discrete optimization approach for replacing the heavy prediction enumeration process, and a prior distribution model for making more accurate use of the Bayes' rule, are proposed respectively. Empirical results show that this method is on par with basic discriminative object detection baselines on COCO dataset. In addition, our method can greatly speed up the previous diffusion-based method for classification without sacrificing accuracy. Code and models are available at https://github.com/LiYinqi/DIVE .

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes