TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation
This work addresses the challenge of interpreting infrared videos for human and machine applications, offering incremental improvements in video translation and object detection tasks.
The paper tackles the problem of translating infrared videos to visible ones, which lack visual details, by proposing TC-PDM, a diffusion-based method that improves temporal consistency and semantic preservation, resulting in a 35.3% improvement in FVD for video translation and a 6.1% increase in AP50 for object detection.
Infrared imaging offers resilience against changing lighting conditions by capturing object temperatures. Yet, in few scenarios, its lack of visual details compared to daytime visible images, poses a significant challenge for human and machine interpretation. This paper proposes a novel diffusion method, dubbed Temporally Consistent Patch Diffusion Models (TC-DPM), for infrared-to-visible video translation. Our method, extending the Patch Diffusion Model, consists of two key components. Firstly, we propose a semantic-guided denoising, leveraging the strong representations of foundational models. As such, our method faithfully preserves the semantic structure of generated visible images. Secondly, we propose a novel temporal blending module to guide the denoising trajectory, ensuring the temporal consistency between consecutive frames. Experiment shows that TC-PDM outperforms state-of-the-art methods by 35.3% in FVD for infrared-to-visible video translation and by 6.1% in AP50 for day-to-night object detection. Our code is publicly available at https://github.com/dzungdoan6/tc-pdm