3DFS: Deformable Dense Depth Fusion and Segmentation for Object Reconstruction from a Handheld Camera
This addresses the problem of detailed object reconstruction for robotics or AR applications, but appears incremental as it builds on existing depth fusion and segmentation techniques.
The paper tackles 3D reconstruction and segmentation of a single object from a handheld camera video by proposing methods for dense depth estimation, fusion, and segmentation, evaluating qualitatively on a new dataset and quantitatively on depth estimation and segmentation tasks.
We propose an approach for 3D reconstruction and segmentation of a single object placed on a flat surface from an input video. Our approach is to perform dense depth map estimation for multiple views using a proposed objective function that preserves detail. The resulting depth maps are then fused using a proposed implicit surface function that is robust to estimation error, producing a smooth surface reconstruction of the entire scene. Finally, the object is segmented from the remaining scene using a proposed 2D-3D segmentation that incorporates image and depth cues with priors and regularization over the 3D volume and 2D segmentations. We evaluate 3D reconstructions qualitatively on our Object-Videos dataset, comparing to fusion, multiview stereo, and segmentation baselines. We also quantitatively evaluate the dense depth estimation using the RGBD Scenes V2 dataset [Henry et al. 2013] and the segmentation using keyframe annotations of the Object-Videos dataset.