RGB-D-Fusion: Image Conditioned Depth Diffusion of Humanoid Subjects
This work addresses depth estimation for humanoid subjects, which is incremental as it builds on existing diffusion models with specific enhancements.
The paper tackles generating high-resolution depth maps from low-resolution RGB images of humanoid subjects, achieving this through a multi-modal conditional diffusion model with a novel depth noise augmentation technique.
We present RGB-D-Fusion, a multi-modal conditional denoising diffusion probabilistic model to generate high resolution depth maps from low-resolution monocular RGB images of humanoid subjects. RGB-D-Fusion first generates a low-resolution depth map using an image conditioned denoising diffusion probabilistic model and then upsamples the depth map using a second denoising diffusion probabilistic model conditioned on a low-resolution RGB-D image. We further introduce a novel augmentation technique, depth noise augmentation, to increase the robustness of our super-resolution model.