CVAILGNov 10, 2025

Inference-Time Scaling of Diffusion Models for Infrared Data Generation

arXiv:2511.07362v11 citationsh-index: 11
Originality Incremental advance
AI Analysis

This work addresses the scarcity of annotated infrared data for computer vision applications, offering an incremental improvement in synthetic data generation for low-data domains.

The paper tackled the problem of generating high-quality synthetic infrared images for training downstream vision models by using an inference-time scaling approach with a domain-adapted CLIP-based verifier, resulting in a 10% reduction in FID scores on a benchmark dataset compared to an unguided baseline.

Infrared imagery enables temperature-based scene understanding using passive sensors, particularly under conditions of low visibility where traditional RGB imaging fails. Yet, developing downstream vision models for infrared applications is hindered by the scarcity of high-quality annotated data, due to the specialized expertise required for infrared annotation. While synthetic infrared image generation has the potential to accelerate model development by providing large-scale, diverse training data, training foundation-level generative diffusion models in the infrared domain has remained elusive due to limited datasets. In light of such data constraints, we explore an inference-time scaling approach using a domain-adapted CLIP-based verifier for enhanced infrared image generation quality. We adapt FLUX.1-dev, a state-of-the-art text-to-image diffusion model, to the infrared domain by finetuning it on a small sample of infrared images using parameter-efficient techniques. The trained verifier is then employed during inference to guide the diffusion sampling process toward higher quality infrared generations that better align with input text prompts. Empirically, we find that our approach leads to consistent improvements in generation quality, reducing FID scores on the KAIST Multispectral Pedestrian Detection Benchmark dataset by 10% compared to unguided baseline samples. Our results suggest that inference-time guidance offers a promising direction for bridging the domain gap in low-data infrared settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes