Shu-Jie Chen

CV
h-index16
4papers
111citations
Novelty66%
AI Score33

4 Papers

CVJul 11, 2024Code
SCPNet: Unsupervised Cross-modal Homography Estimation via Intra-modal Self-supervised Learning

Runmin Zhang, Jun Ma, Si-Yuan Cao et al.

We propose a novel unsupervised cross-modal homography estimation framework based on intra-modal Self-supervised learning, Correlation, and consistent feature map Projection, namely SCPNet. The concept of intra-modal self-supervised learning is first presented to facilitate the unsupervised cross-modal homography estimation. The correlation-based homography estimation network and the consistent feature map projection are combined to form the learnable architecture of SCPNet, boosting the unsupervised learning framework. SCPNet is the first to achieve effective unsupervised homography estimation on the satellite-map image pair cross-modal dataset, GoogleMap, under [-32,+32] offset on a 128x128 image, leading the supervised approach MHN by 14.0% of mean average corner error (MACE). We further conduct extensive experiments on several cross-modal/spectral and manually-made inconsistent datasets, on which SCPNet achieves the state-of-the-art (SOTA) performance among unsupervised approaches, and owns 49.0%, 25.2%, 36.4%, and 10.7% lower MACEs than the supervised approach MHN. Source code is available at https://github.com/RM-Zhang/SCPNet.

CVMar 30, 2024
SGDFormer: One-stage Transformer-based Architecture for Cross-Spectral Stereo Image Guided Denoising

Runmin Zhang, Zhu Yu, Zehua Sheng et al.

Cross-spectral image guided denoising has shown its great potential in recovering clean images with rich details, such as using the near-infrared image to guide the denoising process of the visible one. To obtain such image pairs, a feasible and economical way is to employ a stereo system, which is widely used on mobile devices. Current works attempt to generate an aligned guidance image to handle the disparity between two images. However, due to occlusion, spectral differences and noise degradation, the aligned guidance image generally exists ghosting and artifacts, leading to an unsatisfactory denoised result. To address this issue, we propose a one-stage transformer-based architecture, named SGDFormer, for cross-spectral Stereo image Guided Denoising. The architecture integrates the correspondence modeling and feature fusion of stereo images into a unified network. Our transformer block contains a noise-robust cross-attention (NRCA) module and a spatially variant feature fusion (SVFF) module. The NRCA module captures the long-range correspondence of two images in a coarse-to-fine manner to alleviate the interference of noise. The SVFF module further enhances salient structures and suppresses harmful artifacts through dynamically selecting useful information. Thanks to the above design, our SGDFormer can restore artifact-free images with fine structures, and achieves state-of-the-art performance on various datasets. Additionally, our SGDFormer can be extended to handle other unaligned cross-model guided restoration tasks such as guided depth super-resolution.

CVJun 9, 2021
PCNet: A Structure Similarity Enhancement Method for Multispectral and Multimodal Image Registration

Si-Yuan Cao, Beinan Yu, Lun Luo et al.

Multispectral and multimodal images are of important usage in the field of multi-source visual information fusion. Due to the alternation or movement of image devices, the acquired multispectral and multimodal images are usually misaligned, and hence image registration is pre-requisite. Different from the registration of common images, the registration of multispectral or multimodal images is a challenging problem due to the nonlinear variation of intensity and gradient. To cope with this challenge, we propose the phase congruency network (PCNet) to enhance the structure similarity of multispectral or multimodal images. The images can then be aligned using the similarity-enhanced feature maps produced by the network. PCNet is constructed under the inspiration of the well-known phase congruency. The network embeds the phase congruency prior into two simple trainable layers and series of modified learnable Gabor kernels. Thanks to the prior knowledge, once trained, PCNet is applicable on a variety of multispectral and multimodal data such as flash/no-flash and RGB/NIR images without additional further tuning. The prior also makes the network lightweight. The trainable parameters of PCNet are 2400 times less than the deep-learning registration method DHN, while its registration performance surpasses DHN. Experimental results validate that PCNet outperforms current state-of-the-art conventional multimodal registration algorithms. Besides, PCNet can act as a complementary part of the deep-learning registration methods, which significantly boosts their registration accuracy. The percentage of the number of images under 1 pixel average corner error (ACE) of UDHN is raised from 0.2% to 89.9% after the processing of PCNet.

CVFeb 15, 2017
Normalized Total Gradient: A New Measure for Multispectral Image Registration

Shu-Jie Chen, Hui-Liang Shen

Image registration is a fundamental issue in multispectral image processing. In filter wheel based multispectral imaging systems, the non-coplanar placement of the filters always causes the misalignment of multiple channel images. The selective characteristic of spectral response in multispectral imaging raises two challenges to image registration. First, the intensity levels of a local region may be different in individual channel images. Second, the local intensity may vary rapidly in some channel images while keeps stationary in others. Conventional multimodal measures, such as mutual information, correlation coefficient, and correlation ratio, can register images with different regional intensity levels, but will fail in the circumstance of severe local intensity variation. In this paper, a new measure, namely normalized total gradient (NTG), is proposed for multispectral image registration. The NTG is applied on the difference between two channel images. This measure is based on the key assumption (observation) that the gradient of difference image between two aligned channel images is sparser than that between two misaligned ones. A registration framework, which incorporates image pyramid and global/local optimization, is further introduced for rigid transform. Experimental results validate that the proposed method is effective for multispectral image registration and performs better than conventional methods.