Leiye Liu

h-index2

5papers

28citations

Novelty53%

AI Score55

Ranked #8,826 of 194,257 authors (top 5%)#3,355 in CV (top 6%)

5 Papers

10.2LGJun 28Code

AdaSurvMamba: Dynamic Fusion and Semantic Scanning for Multimodal Survival Analysis

Jialong Zhong, Tingwei Liu, Baokun Yue et al.

Multimodal survival analysis utilizing whole slide images (WSIs) and genomic profiles is fundamental for cancer prognosis. Recently, state-space models like Mamba have emerged as powerful tools for sequence modeling. However, translating this success to complex multimodal tasks is hindered by two critical limitations. First, conventional fusion strategies assume a static multimodal interaction strength, ignoring the fluctuating diagnostic importance of each modality across different patients and local regions. Second, the standard Mamba architecture processes tokens along predefined physical paths. This rigid scanning disrupts the semantic continuity of spatially scattered medical features and exacerbates long-range decay. To address these challenges, we introduce AdaSurvMamba as a novel adaptive framework for multimodal survival analysis. The framework features a Dual-Scale Importance-Aware Reconstruction (DSIR) module to dynamically modulate cross-modal interaction strength. It evaluates diagnostic importance at both the sequence and token levels to reconstruct the input representations. Furthermore, we propose a Semantic Aggregation Scanning (SAS) module to overcome contextual fragmentation. The SAS module dynamically reorganizes discrete tokens into semantically continuous sequences via a shared prototype pool. It explicitly modulates the state transition step size using global modality context and semantic priors to adaptively control the information absorption rate. Experiments across five TCGA cohorts demonstrate consistent gains over existing methods. Code is available at https://github.com/zjlGO/AdaSurvMamba.

8.6CVMar 15Code

Selective Noise Suppression and Discriminative Mutual Interaction for Robust Audio-Visual Segmentation

Kai Peng, Yunzhe Shen, Miao Zhang et al.

The ability to capture and segment sounding objects in dynamic visual scenes is crucial for the development of Audio-Visual Segmentation (AVS) tasks. While significant progress has been made in this area, the interaction between audio and visual modalities still requires further exploration. In this work, we aim to answer the following questions: How can a model effectively suppress audio noise while enhancing relevant audio information? How can we achieve discriminative interaction between the audio and visual modalities? To this end, we propose SDAVS, equipped with the Selective Noise-Resilient Processor (SNRP) module and the Discriminative Audio-Visual Mutual Fusion (DAMF) strategy. The proposed SNRP mitigates audio noise interference by selectively emphasizing relevant auditory cues, while DAMF ensures more consistent audio-visual representations. Experimental results demonstrate that our proposed method achieves state-of-the-art performance on benchmark AVS datasets, especially in multi-source and complex scenes. \textit{The code and model are available at https://github.com/happylife-pk/SDAVS}.

19.7CVApr 8, 2025Code

DefMamba: Deformable Visual State Space Model

Leiye Liu, Miao Zhang, Jihao Yin et al.

Recently, state space models (SSM), particularly Mamba, have attracted significant attention from scholars due to their ability to effectively balance computational efficiency and performance. However, most existing visual Mamba methods flatten images into 1D sequences using predefined scan orders, which results the model being less capable of utilizing the spatial structural information of the image during the feature extraction process. To address this issue, we proposed a novel visual foundation model called DefMamba. This model includes a multi-scale backbone structure and deformable mamba (DM) blocks, which dynamically adjust the scanning path to prioritize important information, thus enhancing the capture and processing of relevant input features. By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details. Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks, including image classification, object detection, instance segmentation, and semantic segmentation. The code is open source on DefMamba.

3.6CVSep 23, 2025Code

Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation

Yunzhe Shen, Kai Peng, Leiye Liu et al.

Audio-visual segmentation (AVS) plays a critical role in multimodal machine learning by effectively integrating audio and visual cues to precisely segment objects or regions within visual scenes. Recent AVS methods have demonstrated significant improvements. However, they overlook the inherent frequency-domain contradictions between audio and visual modalities--the pervasively interfering noise in audio high-frequency signals vs. the structurally rich details in visual high-frequency signals. Ignoring these differences can result in suboptimal performance. In this paper, we rethink the AVS task from a deeper perspective by reformulating AVS task as a frequency-domain decomposition and recomposition problem. To this end, we introduce a novel Frequency-Aware Audio-Visual Segmentation (FAVS) framework consisting of two key modules: Frequency-Domain Enhanced Decomposer (FDED) module and Synergistic Cross-Modal Consistency (SCMC) module. FDED module employs a residual-based iterative frequency decomposition to discriminate modality-specific semantics and structural features, and SCMC module leverages a mixture-of-experts architecture to reinforce semantic consistency and modality-specific feature preservation through dynamic expert routing. Extensive experiments demonstrate that our FAVS framework achieves state-of-the-art performance on three benchmark datasets, and abundant qualitative visualizations further verify the effectiveness of the proposed FDED and SCMC modules. The code will be released as open source upon acceptance of the paper.

3.6IVJun 20, 2024Code

CriDiff: Criss-cross Injection Diffusion Framework via Generative Pre-train for Prostate Segmentation

Tingwei Liu, Miao Zhang, Leiye Liu et al.

Recently, the Diffusion Probabilistic Model (DPM)-based methods have achieved substantial success in the field of medical image segmentation. However, most of these methods fail to enable the diffusion model to learn edge features and non-edge features effectively and to inject them efficiently into the diffusion backbone. Additionally, the domain gap between the images features and the diffusion model features poses a great challenge to prostate segmentation. In this paper, we proposed CriDiff, a two-stage feature injecting framework with a Crisscross Injection Strategy (CIS) and a Generative Pre-train (GP) approach for prostate segmentation. The CIS maximizes the use of multi-level features by efficiently harnessing the complementarity of high and low-level features. To effectively learn multi-level of edge features and non-edge features, we proposed two parallel conditioners in the CIS: the Boundary Enhance Conditioner (BEC) and the Core Enhance Conditioner (CEC), which discriminatively model the image edge regions and non-edge regions, respectively. Moreover, the GP approach eases the inconsistency between the images features and the diffusion model without adding additional parameters. Extensive experiments on four benchmark datasets demonstrate the effectiveness of the proposed method and achieve state-of-the-art performance on four evaluation metrics.