Repeatability Is Not Enough: Learning Affine Regions via Discriminability
This work addresses the challenge of learning discriminative local features for computer vision tasks like image retrieval and stereo matching, representing an incremental improvement over existing methods.
The authors tackled the problem that maximizing geometric repeatability does not lead to reliably matched local affine-covariant regions, proposing a novel hard negative-constant loss function for learning such regions. The resulting AffNet estimator outperformed state-of-the-art methods in bag-of-words image retrieval and wide baseline stereo.
A method for learning local affine-covariant regions is presented. We show that maximizing geometric repeatability does not lead to local regions, a.k.a features,that are reliably matched and this necessitates descriptor-based learning. We explore factors that influence such learning and registration: the loss function, descriptor type, geometric parametrization and the trade-off between matchability and geometric accuracy and propose a novel hard negative-constant loss function for learning of affine regions. The affine shape estimator -- AffNet -- trained with the hard negative-constant loss outperforms the state-of-the-art in bag-of-words image retrieval and wide baseline stereo. The proposed training process does not require precisely geometrically aligned patches.The source codes and trained weights are available at https://github.com/ducha-aiki/affnet