Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions
This work addresses memory and speed limitations in image correspondence estimation for computer vision applications, offering incremental improvements over existing methods.
The paper tackled the problem of accurately estimating local correspondences between image pairs by improving Neighbourhood Consensus Networks to reduce memory and time by over 10x while maintaining performance, and it achieved state-of-the-art results on benchmarks like HPatches Sequences and InLoc.
In this work we target the problem of estimating accurately localised correspondences between a pair of images. We adopt the recent Neighbourhood Consensus Networks that have demonstrated promising performance for difficult correspondence problems and propose modifications to overcome their main limitations: large memory consumption, large inference time and poorly localised correspondences. Our proposed modifications can reduce the memory footprint and execution time more than $10\times$, with equivalent results. This is achieved by sparsifying the correlation tensor containing tentative matches, and its subsequent processing with a 4D CNN using submanifold sparse convolutions. Localisation accuracy is significantly improved by processing the input images in higher resolution, which is possible due to the reduced memory footprint, and by a novel two-stage correspondence relocalisation module. The proposed Sparse-NCNet method obtains state-of-the-art results on the HPatches Sequences and InLoc visual localisation benchmarks, and competitive results in the Aachen Day-Night benchmark.