Lijian Yang

CV
h-index10
3papers
39citations
Novelty33%
AI Score26

3 Papers

CVNov 15, 2024Code
Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion

Dan He, Guofen Wang, Weisheng Li et al.

Multimodal image fusion (MMIF) aims to integrate information from different modalities to obtain a comprehensive image, aiding downstream tasks. However, existing methods tend to prioritize natural image fusion and focus on information complementary and network training strategies. They ignore the essential distinction between natural and medical image fusion and the influence of underlying components. This paper dissects the significant differences between the two tasks regarding fusion goals, statistical properties, and data distribution. Based on this, we rethink the suitability of the normalization strategy and convolutional kernels for end-to-end MMIF.Specifically, this paper proposes a mixture of instance normalization and group normalization to preserve sample independence and reinforce intrinsic feature correlation.This strategy promotes the potential of enriching feature maps, thus boosting fusion performance. To this end, we further introduce the large kernel convolution, effectively expanding receptive fields and enhancing the preservation of image detail. Moreover, the proposed multipath adaptive fusion module recalibrates the decoder input with features of various scales and receptive fields, ensuring the transmission of crucial information. Extensive experiments demonstrate that our method exhibits state-of-the-art performance in multiple fusion tasks and significantly improves downstream applications. The code is available at https://github.com/HeDan-11/LKC-FUNet.

CVApr 11, 2025
Generative AI for Film Creation: A Survey of Recent Advances

Ruihan Zhang, Borou Yu, Jiajian Min et al.

Generative AI (GenAI) is transforming filmmaking, equipping artists with tools like text-to-image and image-to-video diffusion, neural radiance fields, avatar generation, and 3D synthesis. This paper examines the adoption of these technologies in filmmaking, analyzing workflows from recent AI-driven films to understand how GenAI contributes to character creation, aesthetic styling, and narration. We explore key strategies for maintaining character consistency, achieving stylistic coherence, and ensuring motion continuity. Additionally, we highlight emerging trends such as the growing use of 3D generation and the integration of real footage with AI-generated elements. Beyond technical advancements, we examine how GenAI is enabling new artistic expressions, from generating hard-to-shoot footage to dreamlike diffusion-based morphing effects, abstract visuals, and unworldly objects. We also gather artists' feedback on challenges and desired improvements, including consistency, controllability, fine-grained editing, and motion refinement. Our study provides insights into the evolving intersection of AI and filmmaking, offering a roadmap for researchers and artists navigating this rapidly expanding field.

CVMay 5, 2024
Score-based Generative Priors Guided Model-driven Network for MRI Reconstruction

Xiaoyu Qiao, Weisheng Li, Bin Xiao et al.

Score matching with Langevin dynamics (SMLD) method has been successfully applied to accelerated MRI. However, the hyperparameters in the sampling process require subtle tuning, otherwise the results can be severely corrupted by hallucination artifacts, especially with out-of-distribution test data. To address the limitations, we proposed a novel workflow where naive SMLD samples serve as additional priors to guide model-driven network training. First, we adopted a pretrained score network to generate samples as preliminary guidance images (PGI), obviating the need for network retraining, parameter tuning and in-distribution test data. Although PGIs are corrupted by hallucination artifacts, we believe they can provide extra information through effective denoising steps to facilitate reconstruction. Therefore, we designed a denoising module (DM) in the second step to coarsely eliminate artifacts and noises from PGIs. The features are extracted from a score-based information extractor (SIE) and a cross-domain information extractor (CIE), which directly map to the noise patterns. Third, we designed a model-driven network guided by denoised PGIs (DGIs) to further recover fine details. DGIs are densely connected with intermediate reconstructions in each cascade to enrich the information and are periodically updated to provide more accurate guidance. Our experiments on different datasets reveal that despite the low average quality of PGIs, the proposed workflow can effectively extract valuable information to guide the network training, even with severely reduced training data and sampling steps. Our method outperforms other cutting-edge techniques by effectively mitigating hallucination artifacts, yielding robust and high-quality reconstruction results.