BokehDepth: Enhancing Monocular Depth Estimation through Bokeh Generation
This work addresses a domain-specific problem in computer vision for enhancing depth estimation and bokeh rendering, with incremental improvements over existing methods.
The paper tackled the problem of incomplete coupling between bokeh generation and monocular depth estimation by introducing BokehDepth, a two-stage framework that uses defocus as a supervision-free cue, resulting in improved visual fidelity over baselines and consistent boosts in metric accuracy and robustness for depth models.
Bokeh and monocular depth estimation are tightly coupled through the same lens imaging geometry, yet current methods exploit this connection in incomplete ways. High-quality bokeh rendering pipelines typically depend on noisy depth maps, which amplify estimation errors into visible artifacts, while modern monocular metric depth models still struggle on weakly textured, distant and geometrically ambiguous regions where defocus cues are most informative. We introduce BokehDepth, a two-stage framework that decouples bokeh synthesis from depth prediction and treats defocus as an auxiliary supervision-free geometric cue. In Stage-1, a physically guided controllable bokeh generator, built on a powerful pretrained image editing backbone, produces depth-free bokeh stacks with calibrated bokeh strength from a single sharp input. In Stage-2, a lightweight defocus-aware aggregation module plugs into existing monocular depth encoders, fuses features along the defocus dimension, and exposes stable depth-sensitive variations while leaving downstream decoder unchanged. Across challenging benchmarks, BokehDepth improves visual fidelity over depth-map-based bokeh baselines and consistently boosts the metric accuracy and robustness of strong monocular depth foundation models.