CVJun 3, 2025

Guiding Registration with Emergent Similarity from Pre-Trained Diffusion Models

arXiv:2506.02419v15 citationsh-index: 9Has CodeMICCAI
Originality Incremental advance
AI Analysis

This work addresses anatomically inaccurate alignments in medical image registration, particularly for multimodal and monomodal cases, offering a domain-specific improvement.

The paper tackled the problem of deformable image registration in challenging scenarios where intensity-based similarity losses fail, such as when anatomies are present in one image but absent in another, by using pre-trained diffusion model features as a similarity measure to guide registration networks, resulting in superior performance on multimodal 2D and monomodal 3D medical image registration tasks.

Diffusion models, while trained for image generation, have emerged as powerful foundational feature extractors for downstream tasks. We find that off-the-shelf diffusion models, trained exclusively to generate natural RGB images, can identify semantically meaningful correspondences in medical images. Building on this observation, we propose to leverage diffusion model features as a similarity measure to guide deformable image registration networks. We show that common intensity-based similarity losses often fail in challenging scenarios, such as when certain anatomies are visible in one image but absent in another, leading to anatomically inaccurate alignments. In contrast, our method identifies true semantic correspondences, aligning meaningful structures while disregarding those not present across images. We demonstrate superior performance of our approach on two tasks: multimodal 2D registration (DXA to X-Ray) and monomodal 3D registration (brain-extracted to non-brain-extracted MRI). Code: https://github.com/uncbiag/dgir

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes