Mantang Guo

h-index9

7papers

66citations

Novelty56%

AI Score41

Ranked #90,578 of 205,806 authors (top 44%)#30,243 in CV (top 51%)

7 Papers

CVSep 12, 2022

Learning A Locally Unified 3D Point Cloud for View Synthesis

Meng You, Mantang Guo, Xianqiang Lyu et al.

In this paper, we explore the problem of 3D point cloud representation-based view synthesis from a set of sparse source views. To tackle this challenging problem, we propose a new deep learning-based view synthesis paradigm that learns a locally unified 3D point cloud from source views. Specifically, we first construct sub-point clouds by projecting source views to 3D space based on their depth maps. Then, we learn the locally unified 3D point cloud by adaptively fusing points at a local neighborhood defined on the union of the sub-point clouds. Besides, we also propose a 3D geometry-guided image restoration module to fill the holes and recover high-frequency details of the rendered novel views. Experimental results on three benchmark datasets demonstrate that our method can improve the average PSNR by more than 4 dB while preserving more accurate visual details, compared with state-of-the-art view synthesis methods.

CVDec 3, 2025

ReCamDriving: LiDAR-Free Camera-Controlled Novel Trajectory Video Generation

Yaokun Li, Shuaixian Wang, Mantang Guo et al.

We propose ReCamDriving, a purely vision-based, camera-controlled novel-trajectory video generation framework. While repair-based methods fail to restore complex artifacts and LiDAR-based approaches rely on sparse and incomplete cues, ReCamDriving leverages dense and scene-complete 3DGS renderings for explicit geometric guidance, achieving precise camera-controllable generation. To mitigate overfitting to restoration behaviors when conditioned on 3DGS renderings, ReCamDriving adopts a two-stage training paradigm: the first stage uses camera poses for coarse control, while the second stage incorporates 3DGS renderings for fine-grained viewpoint and geometric guidance. Furthermore, we present a 3DGS-based cross-trajectory data curation strategy to eliminate the train-test gap in camera transformation patterns, enabling scalable multi-trajectory supervision from monocular videos. Based on this strategy, we construct the ParaDrive dataset, containing over 110K parallel-trajectory video pairs. Extensive experiments demonstrate that ReCamDriving achieves state-of-the-art camera controllability and structural consistency.

CVJan 22, 2022Code

Content-aware Warping for View Synthesis

Mantang Guo, Junhui Hou, Jing Jin et al.

Existing image-based rendering methods usually adopt depth-based image warping operation to synthesize novel views. In this paper, we reason the essential limitations of the traditional warping operation to be the limited neighborhood and only distance-based interpolation weights. To this end, we propose content-aware warping, which adaptively learns the interpolation weights for pixels of a relatively large neighborhood from their contextual information via a lightweight neural network. Based on this learnable warping module, we propose a new end-to-end learning-based framework for novel view synthesis from a set of input source views, in which two additional modules, namely confidence-based blending and feature-assistant spatial refinement, are naturally proposed to handle the occlusion issue and capture the spatial correlation among pixels of the synthesized view, respectively. Besides, we also propose a weight-smoothness loss term to regularize the network. Experimental results on light field datasets with wide baselines and multi-view datasets show that the proposed method significantly outperforms state-of-the-art methods both quantitatively and visually. The source code will be publicly available at https://github.com/MantangGuo/CW4VS.

CVAug 17, 2021Code

Learning Dynamic Interpolation for Extremely Sparse Light Fields with Wide Baselines

Mantang Guo, Jing Jin, Hui Liu et al.

In this paper, we tackle the problem of dense light field (LF) reconstruction from sparsely-sampled ones with wide baselines and propose a learnable model, namely dynamic interpolation, to replace the commonly-used geometry warping operation. Specifically, with the estimated geometric relation between input views, we first construct a lightweight neural network to dynamically learn weights for interpolating neighbouring pixels from input views to synthesize each pixel of novel views independently. In contrast to the fixed and content-independent weights employed in the geometry warping operation, the learned interpolation weights implicitly incorporate the correspondences between the source and novel views and adapt to different image content information. Then, we recover the spatial correlation between the independently synthesized pixels of each novel view by referring to that of input views using a geometry-based spatial refinement module. We also constrain the angular correlation between the novel views through a disparity-oriented LF structure loss. Experimental results on LF datasets with wide baselines show that the reconstructed LFs achieve much higher PSNR/SSIM and preserve the LF parallax structure better than state-of-the-art methods. The source code is publicly available at https://github.com/MantangGuo/DI4SLF.

IVFeb 14, 2021

Light Field Reconstruction via Deep Adaptive Fusion of Hybrid Lenses

Jing Jin, Mantang Guo, Junhui Hou et al.

This paper explores the problem of reconstructing high-resolution light field (LF) images from hybrid lenses, including a high-resolution camera surrounded by multiple low-resolution cameras. The performance of existing methods is still limited, as they produce either blurry results on plain textured areas or distortions around depth discontinuous boundaries. To tackle this challenge, we propose a novel end-to-end learning-based approach, which can comprehensively utilize the specific characteristics of the input from two complementary and parallel perspectives. Specifically, one module regresses a spatially consistent intermediate estimation by learning a deep multidimensional and cross-domain feature representation, while the other module warps another intermediate estimation, which maintains the high-frequency textures, by propagating the information of the high-resolution view. We finally leverage the advantages of the two intermediate estimations adaptively via the learned attention maps, leading to the final high-resolution LF image with satisfactory results on both plain textured areas and depth discontinuous boundaries. Besides, to promote the effectiveness of our method trained with simulated hybrid data on real hybrid data captured by a hybrid LF imaging system, we carefully design the network architecture and the training strategy. Extensive experiments on both real and simulated hybrid data demonstrate the significant superiority of our approach over state-of-the-art ones. To the best of our knowledge, this is the first end-to-end deep learning method for LF reconstruction from a real hybrid input. We believe our framework could potentially decrease the cost of high-resolution LF data acquisition and benefit LF data storage and transmission.

CVFeb 15, 2019

Breaking the Spatio-Angular Trade-off for Light Field Super-Resolution via LSTM Modelling on Epipolar Plane Images

Hao Zhu, Mantang Guo, Hongdong Li et al.

Light-field cameras (LFC) have received increasing attention due to their wide-spread applications. However, current LFCs suffer from the well-known spatio-angular trade-off, which is considered as an inherent and fundamental limit for LFC designs. In this paper, by doing a detailed geometrical optical analysis of the sampling process in an LFC, we show that the effective sampling resolution is generally higher than the number of micro-lenses. This contribution makes it theoretically possible to break the resolution trade-off. Our second contribution is an epipolar plane image (EPI) based super-resolution method, which can super-resolve the spatial and angular dimensions simultaneously. We prove that the light field is a 2D series, thus, a specifically designed CNN-LSTM network is proposed to capture the continuity property of the EPI. Rather than leveraging semantic information, our network focuses on extracting geometric continuity in the EPI. This gives our method an improved generalization ability and makes it applicable to a wide range of previously unseen scenes. Experiments on both synthetic and real light fields demonstrate the improvements over state-of-the-art, especially in large disparity areas.

CVJun 14, 2018

Dense Light Field Reconstruction From Sparse Sampling Using Residual Network

Mantang Guo, Hao Zhu, Guoqing Zhou et al.

A light field records numerous light rays from a real-world scene. However, capturing a dense light field by existing devices is a time-consuming process. Besides, reconstructing a large amount of light rays equivalent to multiple light fields using sparse sampling arises a severe challenge for existing methods. In this paper, we present a learning based method to reconstruct multiple novel light fields between two mutually independent light fields. We indicate that light rays distributed in different light fields have the same consistent constraints under a certain condition. The most significant constraint is a depth related correlation between angular and spatial dimensions. Our method avoids working out the error-sensitive constraint by employing a deep neural network. We solve residual values of pixels on epipolar plane image (EPI) to reconstruct novel light fields. Our method is able to reconstruct 2 to 4 novel light fields between two mutually independent input light fields. We also compare our results with those yielded by a number of alternatives elsewhere in the literature, which shows our reconstructed light fields have better structure similarity and occlusion relationship.