CVAug 26, 2024

TC-PDM: Temporally Consistent Patch Diffusion Models for Infrared-to-Visible Video Translation

arXiv:2408.14227v11 citationsh-index: 8Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of interpreting infrared videos for human and machine applications, offering incremental improvements in video translation and object detection tasks.

The paper tackles the problem of translating infrared videos to visible ones, which lack visual details, by proposing TC-PDM, a diffusion-based method that improves temporal consistency and semantic preservation, resulting in a 35.3% improvement in FVD for video translation and a 6.1% increase in AP50 for object detection.

Infrared imaging offers resilience against changing lighting conditions by capturing object temperatures. Yet, in few scenarios, its lack of visual details compared to daytime visible images, poses a significant challenge for human and machine interpretation. This paper proposes a novel diffusion method, dubbed Temporally Consistent Patch Diffusion Models (TC-DPM), for infrared-to-visible video translation. Our method, extending the Patch Diffusion Model, consists of two key components. Firstly, we propose a semantic-guided denoising, leveraging the strong representations of foundational models. As such, our method faithfully preserves the semantic structure of generated visible images. Secondly, we propose a novel temporal blending module to guide the denoising trajectory, ensuring the temporal consistency between consecutive frames. Experiment shows that TC-PDM outperforms state-of-the-art methods by 35.3% in FVD for infrared-to-visible video translation and by 6.1% in AP50 for day-to-night object detection. Our code is publicly available at https://github.com/dzungdoan6/tc-pdm

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes