Working hard to know your neighbor's margins: Local descriptor learning loss
This work addresses the need for efficient and high-performing local descriptors in computer vision tasks like stereo matching and retrieval, offering an incremental improvement over existing methods.
The paper tackles the problem of learning local feature descriptors by introducing a novel loss function inspired by Lowe's matching criterion, which maximizes the distance between the closest positive and negative patches in a batch. The result is a compact 128-dimensional descriptor that achieves state-of-the-art performance in wide baseline stereo, patch verification, and instance retrieval benchmarks, with computation taking about 1 millisecond on a low-end GPU.
We introduce a novel loss for learning local feature descriptors which is inspired by the Lowe's matching criterion for SIFT. We show that the proposed loss that maximizes the distance between the closest positive and closest negative patch in the batch is better than complex regularization methods; it works well for both shallow and deep convolution network architectures. Applying the novel loss to the L2Net CNN architecture results in a compact descriptor -- it has the same dimensionality as SIFT (128) that shows state-of-art performance in wide baseline stereo, patch verification and instance retrieval benchmarks. It is fast, computing a descriptor takes about 1 millisecond on a low-end GPU.