Jingwei Guan

MM
h-index3
4papers
21citations
Novelty45%
AI Score36

4 Papers

57.8CVMar 12
ZeroSense:How Vision matters in Long Context Compression

Yonghan Gao, Zehong Chen, Lijian Xu et al.

Recent visual-text compression (VTC) methods, typified by DeepSeek-OCR, report impressive high token compression ratios for long-context modeling tasks by leveraging text-to-image rendering. However, existing evaluation protocols heavily rely on downstream task performance. Such evaluation metrics fail to accurately measure text preservation due to the strong inherent linguistic priors of Multimodal Large Language Models (MLLMs). In this work, we introduce a new evaluation framework that decouples MLLMs' capabilities to faithfully assess VTC quality. Within this framework, we further introduce the ZeroSense Benchmark to ensure low semantic correlation of testing samples. By eliminating contextual dependencies, our benchmark guarantees that the evaluation results are purely reflective of VTC quality, unaffected by the semantic inference capabilities of downstream models. Extensive experiments across multiple datasets demonstrate that VTC quality and downstream task accuracy diverge significantly, highlighting the necessity of our decoupled evaluation framework.

IVNov 5, 2024
LDPM: Towards undersampled MRI reconstruction with MR-VAE and Latent Diffusion Prior

Xingjian Tang, Jingwei Guan, Linge Li et al.

Diffusion models, as powerful generative models, have found a wide range of applications and shown great potential in solving image reconstruction problems. Some works attempted to solve MRI reconstruction with diffusion models, but these methods operate directly in pixel space, leading to higher computational costs for optimization and inference. Latent diffusion models, pre-trained on natural images with rich visual priors, are expected to solve the high computational cost problem in MRI reconstruction by operating in a lower-dimensional latent space. However, direct application to MRI reconstruction faces three key challenges: (1) absence of explicit control mechanisms for medical fidelity, (2) domain gap between natural images and MR physics, and (3) undefined data consistency in latent space. To address these challenges, a novel Latent Diffusion Prior-based undersampled MRI reconstruction (LDPM) method is proposed. Our LDPM framework addresses these challenges by: (1) a sketch-guided pipeline with a two-step reconstruction strategy, which balances perceptual quality and anatomical fidelity, (2) an MRI-optimized VAE (MR-VAE), which achieves an improvement of approximately 3.92 dB in PSNR for undersampled MRI reconstruction compared to that with SD-VAE \cite{sd}, and (3) Dual-Stage Sampler, a modified version of spaced DDPM sampler, which enforces high-fidelity reconstruction in the latent space. Experiments on the fastMRI dataset\cite{fastmri} demonstrate the state-of-the-art performance of the proposed method and its robustness across various scenarios. The effectiveness of each module is also verified through ablation experiments.

MMApr 1, 2019
The bilateral solver for quality estimation based multi-focus image fusion

Jingwei Guan, Yibo Chen, Wai-kuen Cham

In this work, a fast Bilateral Solver for Quality Estimation Based multi-focus Image Fusion method (BS-QEBIF) is proposed. The all-in-focus image is generated by pixel-wise summing up the multi-focus source images with their focus-levels maps as weights. Since the visual quality of an image patch is highly correlated with its focus level, the focus-level maps are preliminarily obtained based on visual quality scores, as pre-estimations. However, the pre-estimations are not ideal. Thus the fast bilateral solver is then adopted to smooth the pre-estimations, and edges in the multi-focus source images can be preserved simultaneously. The edge-preserving smoothed results are utilized as final focus-level maps. Moreover, this work provides a confidence-map solution for the unstable fusion in the focus-level-changed boundary regions. Experiments were conducted on $25$ pairs of source images. The proposed BS-QEBIF outperforms the other $13$ fusion methods objectively and subjectively. The all-in-focus image produced by the proposed method can well maintain the details in the multi-focus source images and does not suffer from any residual errors. Experimental results show that BS-QEBIF can handle the focus-level-changed boundary regions without any blocking, ringing and blurring artifacts.

MMMar 28, 2019
SRDGAN: learning the noise prior for Super Resolution with Dual Generative Adversarial Networks

Jingwei Guan, Cheng Pan, Songnan Li et al.

Single Image Super Resolution (SISR) is the task of producing a high resolution (HR) image from a given low-resolution (LR) image. It is a well researched problem with extensive commercial applications such as digital camera, video compression, medical imaging and so on. Most super resolution works focus on the features learning architecture, which can recover the texture details as close as possible. However, these works suffer from the following challenges: (1) The low-resolution (LR) training images are artificially synthesized using HR images with bicubic downsampling, which have much richer-information than real demosaic-upscaled mobile images. The mismatch between training and inference mobile data heavily blocks the improvement of practical super resolution algorithms. (2) These methods cannot effectively handle the blind distortions during super resolution in practical applications. In this work, an end-to-end novel framework, including high-to-low network and low-to-high network, is proposed to solve the above problems with dual Generative Adversarial Networks (GAN). First, the above mismatch problems are well explored with the high-to-low network, where clear high-resolution image and the corresponding realistic low-resolution image pairs can be generated. Moreover, a large-scale General Mobile Super Resolution Dataset, GMSR, is proposed, which can be utilized for training or as a fair comparison benchmark for super resolution methods. Second, an effective low-to-high network (super resolution network) is proposed in the framework. Benefiting from the GMSR dataset and novel training strategies, the super resolution model can effectively handle detail recovery and denoising at the same time.