CVApr 12, 2019

Multi-View Stereo by Temporal Nonparametric Fusion

arXiv:1904.06397v299 citations
AI Analysis

This work addresses depth estimation for computer vision applications, presenting an incremental improvement with a novel fusion method.

The paper tackles depth estimation from multi-view images by introducing a model that leverages information from previous scene encodings through a nonparametric Gaussian process prior, achieving real-time performance on smart devices.

We propose a novel idea for depth estimation from multi-view image-pose pairs, where the model has capability to leverage information from previous latent-space encodings of the scene. This model uses pairs of images and poses, which are passed through an encoder--decoder model for disparity estimation. The novelty lies in soft-constraining the bottleneck layer by a nonparametric Gaussian process prior. We propose a pose-kernel structure that encourages similar poses to have resembling latent spaces. The flexibility of the Gaussian process (GP) prior provides adapting memory for fusing information from previous views. We train the encoder--decoder and the GP hyperparameters jointly end-to-end. In addition to a batch method, we derive a lightweight estimation scheme that circumvents standard pitfalls in scaling Gaussian process inference, and demonstrate how our scheme can run in real-time on smart devices.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes