PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors
This addresses a practical bottleneck for computer vision applications requiring efficient local image matching.
The paper tackles the problem of high computational complexity in CNN-based local image descriptors by proposing PN-Net, which achieves improved matching performance while significantly reducing training/execution time and maintaining low dimensionality. The 128-dimensional descriptor extraction time on GPU is comparable to fast binary descriptors like BRIEF and ORB.
In this paper we propose a new approach for learning local descriptors for matching image patches. It has recently been demonstrated that descriptors based on convolutional neural networks (CNN) can significantly improve the matching performance. Unfortunately their computational complexity is prohibitive for any practical application. We address this problem and propose a CNN based descriptor with improved matching performance, significantly reduced training and execution time, as well as low dimensionality. We propose to train the network with triplets of patches that include a positive and negative pairs. To that end we introduce a new loss function that exploits the relations within the triplets. We compare our approach to recently introduced MatchNet and DeepCompare and demonstrate the advantages of our descriptor in terms of performance, memory footprint and speed i.e. when run in GPU, the extraction time of our 128 dimensional feature is comparable to the fastest available binary descriptors such as BRIEF and ORB.