Adaptive Domain Shift in Diffusion Models for Cross-Modality Image Translation
This addresses the problem of inefficient cross-modal image translation for researchers and practitioners in fields like medical imaging and remote sensing, though it appears incremental as it builds on existing diffusion model frameworks.
The paper tackles the problem of inefficient and brittle cross-modal image translation in diffusion models by addressing the fixed-schedule domain transfer issue, which causes off-manifold sampling and semantic drift. The result is improved structural fidelity and semantic consistency with fewer denoising steps across medical imaging, remote sensing, and electroluminescence tasks.
Cross-modal image translation remains brittle and inefficient. Standard diffusion approaches often rely on a single, global linear transfer between domains. We find that this shortcut forces the sampler to traverse off-manifold, high-cost regions, inflating the correction burden and inviting semantic drift. We refer to this shared failure mode as fixed-schedule domain transfer. In this paper, we embed domain-shift dynamics directly into the generative process. Our model predicts a spatially varying mixing field at every reverse step and injects an explicit, target-consistent restoration term into the drift. This in-step guidance keeps large updates on-manifold and shifts the model's role from global alignment to local residual correction. We provide a continuous-time formulation with an exact solution form and derive a practical first-order sampler that preserves marginal consistency. Empirically, across translation tasks in medical imaging, remote sensing, and electroluminescence semantic mapping, our framework improves structural fidelity and semantic consistency while converging in fewer denoising steps.