CVDec 16, 2024

SpatialMe: Stereo Video Conversion Using Depth-Warping and Blend-Inpainting

Jiale Zhang, Qianxi Jia, Yang Liu, Wei Zhang, Wei Wei, Xin Tian

arXiv:2412.11512v19.68 citationsh-index: 107ICME

Originality Incremental advance

AI Analysis

This work addresses the challenge of creating immersive stereo videos from monocular inputs, which is incremental by building on existing novel view synthesis techniques.

The paper tackles the problem of converting monocular videos to stereo format by introducing SpatialMe, a framework using depth-warping and blend-inpainting, which achieves superior results over state-of-the-art methods as demonstrated in extensive experiments.

Stereo video conversion aims to transform monocular videos into immersive stereo format. Despite the advancements in novel view synthesis, it still remains two major challenges: i) difficulty of achieving high-fidelity and stable results, and ii) insufficiency of high-quality stereo video data. In this paper, we introduce SpatialMe, a novel stereo video conversion framework based on depth-warping and blend-inpainting. Specifically, we propose a mask-based hierarchy feature update (MHFU) refiner, which integrate and refine the outputs from designed multi-branch inpainting module, using feature update unit (FUU) and mask mechanism. We also propose a disparity expansion strategy to address the problem of foreground bleeding. Furthermore, we conduct a high-quality real-world stereo video dataset -- StereoV1K, to alleviate the data shortage. It contains 1000 stereo videos captured in real-world at a resolution of 1180 x 1180, covering various indoor and outdoor scenes. Extensive experiments demonstrate the superiority of our approach in generating stereo videos over state-of-the-art methods.

View on arXiv PDF

Similar