Learned Multi-Patch Similarity
This addresses a fundamental bottleneck in multi-view computer vision for researchers and practitioners, but it is incremental as it builds on existing machine learning techniques.
The paper tackles the problem of measuring similarity across more than two image patches in multi-view depth estimation by proposing a learned matching function that directly maps multiple patches to a scalar similarity score, and experiments show advantages over pairwise similarity methods.
Estimating a depth map from multiple views of a scene is a fundamental task in computer vision. As soon as more than two viewpoints are available, one faces the very basic question how to measure similarity across >2 image patches. Surprisingly, no direct solution exists, instead it is common to fall back to more or less robust averaging of two-view similarities. Encouraged by the success of machine learning, and in particular convolutional neural networks, we propose to learn a matching function which directly maps multiple image patches to a scalar similarity score. Experiments on several multi-view datasets demonstrate that this approach has advantages over methods based on pairwise patch similarity.