CVOct 7, 2025

Fine-grained Defocus Blur Control for Generative Image Models

arXiv:2510.06215v1h-index: 28
Originality Incremental advance
AI Analysis

This addresses a specific limitation in generative image models for users needing camera-like artistic control, though it is incremental as it builds on existing diffusion and blur models.

The paper tackles the problem of text-to-image diffusion models' inability to incorporate fine-grained camera metadata like aperture settings, introducing a framework that enables precise interactive control over defocus effects while preserving scene contents, with experimental results showing superior fine-grained control.

Current text-to-image diffusion models excel at generating diverse, high-quality images, yet they struggle to incorporate fine-grained camera metadata such as precise aperture settings. In this work, we introduce a novel text-to-image diffusion framework that leverages camera metadata, or EXIF data, which is often embedded in image files, with an emphasis on generating controllable lens blur. Our method mimics the physical image formation process by first generating an all-in-focus image, estimating its monocular depth, predicting a plausible focus distance with a novel focus distance transformer, and then forming a defocused image with an existing differentiable lens blur model. Gradients flow backwards through this whole process, allowing us to learn without explicit supervision to generate defocus effects based on content elements and the provided EXIF data. At inference time, this enables precise interactive user control over defocus effects while preserving scene contents, which is not achievable with existing diffusion models. Experimental results demonstrate that our model enables superior fine-grained control without altering the depicted scene.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes