CVAug 15, 2025

Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

arXiv:2508.11330v14 citationsh-index: 2

Originality Incremental advance

AI Analysis

This work addresses a specific bottleneck in diffusion-based image classification for computer vision researchers, offering an incremental improvement by optimizing noise to enhance stability and speed.

The paper tackles the problem of noise instability in Diffusion Classifiers, which causes significant performance variations and requires slow ensembling of many noises; the proposed Noise Optimization method (NoOp) learns matching noises to achieve stable classification, demonstrating effectiveness across various datasets.

Although today's pretrained discriminative vision-language models (e.g., CLIP) have demonstrated strong perception abilities, such as zero-shot image classification, they also suffer from the bag-of-words problem and spurious bias. To mitigate these problems, some pioneering studies leverage powerful generative models (e.g., pretrained diffusion models) to realize generalizable image classification, dubbed Diffusion Classifier (DC). Specifically, by randomly sampling a Gaussian noise, DC utilizes the differences of denoising effects with different category conditions to classify categories. Unfortunately, an inherent and notorious weakness of existing DCs is noise instability: different random sampled noises lead to significant performance changes. To achieve stable classification performance, existing DCs always ensemble the results of hundreds of sampled noises, which significantly reduces the classification speed. To this end, we firstly explore the role of noise in DC, and conclude that: there are some ``good noises'' that can relieve the instability. Meanwhile, we argue that these good noises should meet two principles: Frequency Matching and Spatial Matching. Regarding both principles, we propose a novel Noise Optimization method to learn matching (i.e., good) noise for DCs: NoOp. For frequency matching, NoOp first optimizes a dataset-specific noise: Given a dataset and a timestep t, optimize one randomly initialized parameterized noise. For Spatial Matching, NoOp trains a Meta-Network that adopts an image as input and outputs image-specific noise offset. The sum of optimized noise and noise offset will be used in DC to replace random noise. Extensive ablations on various datasets demonstrated the effectiveness of NoOp.

View on arXiv PDF

Similar