EDFFDNet: Towards Accurate and Efficient Unsupervised Multi-Grid Image Registration
This addresses image registration challenges in real-world scenes with depth variations, offering significant computational improvements and accuracy gains, though it appears incremental by building on existing deep learning approaches.
The paper tackles the problem of accurate and efficient unsupervised multi-grid image registration for scenes with depth disparities, proposing EDFFDNet which reduces parameters, memory, and runtime by 70.5%, 32.6%, and 33.7% respectively while achieving a 0.5 dB PSNR gain over state-of-the-art methods.
Previous deep image registration methods that employ single homography, multi-grid homography, or thin-plate spline often struggle with real scenes containing depth disparities due to their inherent limitations. To address this, we propose an Exponential-Decay Free-Form Deformation Network (EDFFDNet), which employs free-form deformation with an exponential-decay basis function. This design achieves higher efficiency and performs well in scenes with depth disparities, benefiting from its inherent locality. We also introduce an Adaptive Sparse Motion Aggregator (ASMA), which replaces the MLP motion aggregator used in previous methods. By transforming dense interactions into sparse ones, ASMA reduces parameters and improves accuracy. Additionally, we propose a progressive correlation refinement strategy that leverages global-local correlation patterns for coarse-to-fine motion estimation, further enhancing efficiency and accuracy. Experiments demonstrate that EDFFDNet reduces parameters, memory, and total runtime by 70.5%, 32.6%, and 33.7%, respectively, while achieving a 0.5 dB PSNR gain over the state-of-the-art method. With an additional local refinement stage,EDFFDNet-2 further improves PSNR by 1.06 dB while maintaining lower computational costs. Our method also demonstrates strong generalization ability across datasets, outperforming previous deep learning methods.