A Theory of Local Matching: SIFT and Beyond
This work provides a theoretical foundation for local descriptors in computer vision, which could benefit applications like image matching and object recognition, though it appears incremental as it builds upon existing SIFT methods.
The authors tackled the problem of explaining the success of SIFT and its extension DSP-SIFT in visual matching by constructing a general theory based on energy minimization and heat diffusion, showing that DSP-SIFT better approximates the theoretical solution and deriving new descriptors with fewer parameters and potential improvements in handling affine deformations.
Why has SIFT been so successful? Why its extension, DSP-SIFT, can further improve SIFT? Is there a theory that can explain both? How can such theory benefit real applications? Can it suggest new algorithms with reduced computational complexity or new descriptors with better accuracy for matching? We construct a general theory of local descriptors for visual matching. Our theory relies on concepts in energy minimization and heat diffusion. We show that SIFT and DSP-SIFT approximate the solution the theory suggests. In particular, DSP-SIFT gives a better approximation to the theoretical solution; justifying why DSP-SIFT outperforms SIFT. Using the developed theory, we derive new descriptors that have fewer parameters and are potentially better in handling affine deformations.