Tao Ye

CV
h-index9
15papers
113citations
Novelty44%
AI Score55

15 Papers

CVNov 3, 2023Code
Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

Xilai Li, Xiaosong Li, Tao Ye et al.

Multi-modal image fusion (MMIF) integrates valuable information from different modality images into a fused one. However, the fusion of multiple visible images with different focal regions and infrared images is a unprecedented challenge in real MMIF applications. This is because of the limited depth of the focus of visible optical lenses, which impedes the simultaneous capture of the focal information within the same scene. To address this issue, in this paper, we propose a MMIF framework for joint focused integration and modalities information extraction. Specifically, a semi-sparsity-based smoothing filter is introduced to decompose the images into structure and texture components. Subsequently, a novel multi-scale operator is proposed to fuse the texture components, capable of detecting significant information by considering the pixel focus attributes and relevant data from various modal images. Additionally, to achieve an effective capture of scene luminance and reasonable contrast maintenance, we consider the distribution of energy information in the structural components in terms of multi-directional frequency variance and information entropy. Extensive experiments on existing MMIF datasets, as well as the object detection and depth estimation tasks, consistently demonstrate that the proposed algorithm can surpass the state-of-the-art methods in visual perception and quantitative evaluation. The code is available at https://github.com/ixilai/MFIF-MMIF.

CVMar 3Code
CAWM-Mamba: A unified model for infrared-visible image fusion and compound adverse weather restoration

Huichun Liu, Xiaosong Li, Zhuangfan Huang et al.

Multimodal Image Fusion (MMIF) integrates complementary information from various modalities to produce clearer and more informative fused images. MMIF under adverse weather is particularly crucial in autonomous driving and UAV monitoring applications. However, existing adverse weather fusion methods generally only tackle single types of degradation such as haze, rain, or snow, and fail when multiple degradations coexist (e.g., haze+rain, rain+snow). To address this challenge, we propose Compound Adverse Weather Mamba (CAWM-Mamba), the first end-to-end framework that jointly performs image fusion and compound weather restoration with unified shared weights. Our network contains three key components: (1) a Weather-Aware Preprocess Module (WAPM) to enhance degraded visible features and extracts global weather embeddings; (2) a Cross-modal Feature Interaction Module (CFIM) to facilitate the alignment of heterogeneous modalities and exchange of complementary features across modalities; and (3) a Wavelet Space State Block (WSSB) that leverages wavelet-domain decomposition to decouple multi-frequency degradations. WSSB includes Freq-SSM, a module that models anisotropic high-frequency degradation without redundancy, and a unified degradation representation mechanism to further improve generalization across complex compound weather conditions. Extensive experiments on the AWMM-100K benchmark and three standard fusion datasets demonstrate that CAWM-Mamba consistently outperforms state-of-the-art methods in both compound and single-weather scenarios. In addition, our fusion results excel in downstream tasks covering semantic segmentation and object detection, confirming the practical value in real-world adverse weather perception. The source code will be available at https://github.com/Feecuin/CAWM-Mamba.

CVFeb 3, 2024Code
UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework

Xilai Li, Xiaosong Li, Tianshu Tan et al.

Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to the fusion task in complex scenes, leading to sub-optimal results under interference. To fill this gap, we propose a unified framework for infrared and visible images fusion in complex scenes, termed UMCFuse. Specifically, we classify the pixels of visible images from the degree of scattering of light transmission, allowing us to separate fine details from overall intensity. Maintaining a balance between interference removal and detail preservation is essential for the generalization capacity of the proposed method. Therefore, we propose an adaptive denoising strategy for the fusion of detail layers. Meanwhile, we fuse the energy features from different modalities by analyzing them from multiple directions. Extensive fusion experiments on real and synthetic complex scenes datasets cover adverse weather conditions, noise, blur, overexposure, fire, as well as downstream tasks including semantic segmentation, object detection, salient object detection, and depth estimation, consistently indicate the superiority of the proposed method compared with the recent representative methods. Our code is available at https://github.com/ixilai/UMCFuse.

CVApr 2Code
MAVFusion: Efficient Infrared and Visible Video Fusion via Motion-Aware Sparse Interaction

Xilai Li, Weijun Jiang, Xiaosong Li et al.

Infrared and visible video fusion combines the object saliency from infrared images with the texture details from visible images to produce semantically rich fusion results. However, most existing methods are designed for static image fusion and cannot effectively handle frame-to-frame motion in videos. Current video fusion methods improve temporal consistency by introducing interactions across frames, but they often require high computational cost. To mitigate these challenges, we propose MAVFusion, an end-to-end video fusion framework featuring a motion-aware sparse interaction mechanism that enhances efficiency while maintaining superior fusion quality. Specifically, we leverage optical flow to identify dynamic regions in multi-modal sequences, adaptively allocating computationally intensive cross-modal attention to these sparse areas to capture salient transitions and facilitate inter-modal information exchange. For static background regions, a lightweight weak interaction module is employed to maintain structural and appearance integrity. By decoupling the processing of dynamic and static regions, MAVFusion simultaneously preserves temporal consistency and fine-grained details while significantly accelerating inference. Extensive experiments demonstrate that MAVFusion achieves state-of-the-art performance on multiple infrared and visible video benchmarks, achieving a speed of 14.16\,FPS at $640 \times 480$ resolution. The source code will be available at https://github.com/ixilai/MAVFusion.

CVOct 28, 2025Code
A Luminance-Aware Multi-Scale Network for Polarization Image Fusion with a Multi-Scene Dataset

Zhuangfan Huang, Xiaosong Li, Gao Wang et al.

Polarization image fusion combines S0 and DOLP images to reveal surface roughness and material properties through complementary texture features, which has important applications in camouflage recognition, tissue pathology analysis, surface defect detection and other fields. To intergrate coL-Splementary information from different polarized images in complex luminance environment, we propose a luminance-aware multi-scale network (MLSN). In the encoder stage, we propose a multi-scale spatial weight matrix through a brightness-branch , which dynamically weighted inject the luminance into the feature maps, solving the problem of inherent contrast difference in polarized images. The global-local feature fusion mechanism is designed at the bottleneck layer to perform windowed self-attention computation, to balance the global context and local details through residual linking in the feature dimension restructuring stage. In the decoder stage, to further improve the adaptability to complex lighting, we propose a Brightness-Enhancement module, establishing the mapping relationship between luminance distribution and texture features, realizing the nonlinear luminance correction of the fusion result. We also present MSP, an 1000 pairs of polarized images that covers 17 types of indoor and outdoor complex lighting scenes. MSP provides four-direction polarization raw maps, solving the scarcity of high-quality datasets in polarization image fusion. Extensive experiment on MSP, PIF and GAND datasets verify that the proposed MLSN outperms the state-of-the-art methods in subjective and objective evaluations, and the MS-SSIM and SD metircs are higher than the average values of other methods by 8.57%, 60.64%, 10.26%, 63.53%, 22.21%, and 54.31%, respectively. The source code and dataset is avalable at https://github.com/1hzf/MLS-UNet.

CVAug 23, 2025Code
AWM-Fuse: Multi-Modality Image Fusion for Adverse Weather via Global and Local Text Perception

Xilai Li, Huichun Liu, Xiaosong Li et al.

Multi-modality image fusion (MMIF) in adverse weather aims to address the loss of visual information caused by weather-related degradations, providing clearer scene representations. Although less studies have attempted to incorporate textual information to improve semantic perception, they often lack effective categorization and thorough analysis of textual content. In response, we propose AWM-Fuse, a novel fusion method for adverse weather conditions, designed to handle multiple degradations through global and local text perception within a unified, shared weight architecture. In particular, a global feature perception module leverages BLIP-produced captions to extract overall scene features and identify primary degradation types, thus promoting generalization across various adverse weather conditions. Complementing this, the local module employs detailed scene descriptions produced by ChatGPT to concentrate on specific degradation effects through concrete textual cues, thereby capturing finer details. Furthermore, textual descriptions are used to constrain the generation of fusion images, effectively steering the network learning process toward better alignment with real semantic labels, thereby promoting the learning of more meaningful visual features. Extensive experiments demonstrate that AWM-Fuse outperforms current state-of-the-art methods in complex weather conditions and downstream tasks. Our code is available at https://github.com/Feecuin/AWM-Fuse.

CVDec 23, 2025
JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement

Tao Ye, Hongbin Ren, Chongbing Zhang et al.

Given the complexity of underwater environments and the variability of water as a medium, underwater images are inevitably subject to various types of degradation. The degradations present nonlinear coupling rather than simple superposition, which renders the effective processing of such coupled degradations particularly challenging. Most existing methods focus on designing specific branches, modules, or strategies for specific degradations, with little attention paid to the potential information embedded in their coupling. Consequently, they struggle to effectively capture and process the nonlinear interactions of multiple degradations from a bottom-up perspective. To address this issue, we propose JDPNet, a joint degradation processing network, that mines and unifies the potential information inherent in coupled degradations within a unified framework. Specifically, we introduce a joint feature-mining module, along with a probabilistic bootstrap distribution strategy, to facilitate effective mining and unified adjustment of coupled degradation features. Furthermore, to balance color, clarity, and contrast, we design a novel AquaBalanceLoss to guide the network in learning from multiple coupled degradation losses. Experiments on six publicly available underwater datasets, as well as two new datasets constructed in this study, show that JDPNet exhibits state-of-the-art performance while offering a better tradeoff between performance, parameter size, and computational cost.

CVApr 27, 2024
MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion

Jingxue Huang, Xilai Li, Tianshu Tan et al.

Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space, thereby producing an informative fused image. However, the existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality in certain regions of the fusion results. In this study, we analyzed the spatial distribution differences of information in different modalities and proved that encoding features within the same network is not conducive to achieving simultaneous deep feature space alignment for multi-modal images. To overcome this issue, a Multi-Modal Asymmetric UNet (MMA-UNet) was proposed. We separately trained specialized feature encoders for different modal and implemented a cross-scale fusion strategy to maintain the features from different modalities within the same representation space, ensuring a balanced information fusion process. Furthermore, extensive fusion and downstream task experiments were conducted to demonstrate the efficiency of MMA-UNet in fusing infrared and visible image information, producing visually natural and semantically rich fusion results. Its performance surpasses that of the state-of-the-art comparison fusion methods.

LGMay 20, 2025
Enhancing Learned Knowledge in LoRA Adapters Through Efficient Contrastive Decoding on Ascend NPUs

Morgan Lindsay Heisler, Linzi Xing, Ge Shi et al.

Huawei Cloud users leverage LoRA (Low-Rank Adaptation) as an efficient and scalable method to fine-tune and customize large language models (LLMs) for application-specific needs. However, tasks that require complex reasoning or deep contextual understanding are often hindered by biases or interference from the base model when using typical decoding methods like greedy or beam search. These biases can lead to generic or task-agnostic responses from the base model instead of leveraging the LoRA-specific adaptations. In this paper, we introduce Contrastive LoRA Decoding (CoLD), a novel decoding framework designed to maximize the use of task-specific knowledge in LoRA-adapted models, resulting in better downstream performance. CoLD uses contrastive decoding by scoring candidate tokens based on the divergence between the probability distributions of a LoRA-adapted expert model and the corresponding base model. This approach prioritizes tokens that better align with the LoRA's learned representations, enhancing performance for specialized tasks. While effective, a naive implementation of CoLD is computationally expensive because each decoding step requires evaluating multiple token candidates across both models. To address this, we developed an optimized kernel for Huawei's Ascend NPU. CoLD achieves up to a 5.54% increase in task accuracy while reducing end-to-end latency by 28% compared to greedy decoding. This work provides practical and efficient decoding strategies for fine-tuned LLMs in resource-constrained environments and has broad implications for applied data science in both cloud and on-premises settings.

SDOct 15, 2020
Music Classification in MIDI Format based on LSTM Mdel

Yiting Xia, Yiwei Jiang, Tao Ye

Music classification between music made by AI or human composers can be done by deep learning networks. We first transformed music samples in midi format to natural language sequences, then classified these samples by mLSTM (multiplicative Long Short Term Memory) + logistic regression. The accuracy of the result evaluated by 10-fold cross validation can reach 90%. Our work indicates that music generated by AI and human composers do have different characteristics, which can be learned by deep learning networks.

IRJun 18, 2018
Modeling Musical Taste Evolution with Recurrent Neural Networks

Massimo Quadrana, Marta Reznakova, Tao Ye et al.

Finding the music of the moment can often be a challenging problem, even for well-versed music listeners. Musical tastes are constantly in flux, and the problem of developing computational models for musical taste dynamics presents a rich and nebulous problem space. A variety of factors all play some role in determining preferences (e.g., popularity, musicological, social, geographical, generational), and these factors vary across different listeners and contexts. In this paper, we leverage a massive dataset on internet radio station creation from a large music streaming company in order to develop computational models of listener taste evolution. We delve deep into the complexities of this domain, identifying some of the unique challenges that it presents, and develop a model utilizing recurrent neural networks. We apply our model to the problem of next station prediction and show that it not only outperforms several baselines, but excels at long tail music personalization, particularly by learning the long-term dependency structure of listener music preference evolution.

NEMay 18, 2014
A Memetic Algorithm for the Linear Ordering Problem with Cumulative Costs

Tao Ye, Kan Zhou, Zhipeng Lu et al.

This paper introduces an effective memetic algorithm for the linear ordering problem with cumulative costs. The proposed algorithm combines an order-based recombination operator with an improved forward-backward local search procedure and employs a solution quality based replacement criterion for pool updating. Extensive experiments on 118 well-known benchmark instances show that the proposed algorithm achieves competitive results by identifying 46 new upper bounds. Furthermore, some critical ingredients of our algorithm are analyzed to understand the source of its performance.

NEMay 18, 2014
A Multi-parent Memetic Algorithm for the Linear Ordering Problem

Tao Ye, Tao Wang, Zhipeng Lu et al.

In this paper, we present a multi-parent memetic algorithm (denoted by MPM) for solving the classic Linear Ordering Problem (LOP). The MPM algorithm integrates in particular a multi-parent recombination operator for generating offspring solutions and a distance-and-quality based criterion for pool updating. Our MPM algorithm is assessed on 8 sets of 484 widely used LOP instances and compared with several state-of-the-art algorithms in the literature, showing the efficacy of the MPM algorithm. Specifically, for the 255 instances whose optimal solutions are unknown, the MPM is able to detect better solutions than the previous best-known ones for 66 instances, while matching the previous best-known results for 163 instances. Furthermore, some additional experiments are carried out to analyze the key elements and important parameters of MPM.

OCJun 4, 2013
Iterated Tabu Search Algorithm for Packing Unequal Circles in a Circle

Tao Ye, Wenqi Huang, Zhipeng Lu

This paper presents an Iterated Tabu Search algorithm (denoted by ITS-PUCC) for solving the problem of Packing Unequal Circles in a Circle. The algorithm exploits the continuous and combinatorial nature of the unequal circles packing problem. It uses a continuous local optimization method to generate locally optimal packings. Meanwhile, it builds a neighborhood structure on the set of local minimum via two appropriate perturbation moves and integrates two combinatorial optimization methods, Tabu Search and Iterated Local Search, to systematically search for good local minima. Computational experiments on two sets of widely-used test instances prove its effectiveness and efficiency. For the first set of 46 instances coming from the famous circle packing contest and the second set of 24 instances widely used in the literature, the algorithm is able to discover respectively 14 and 16 better solutions than the previous best-known records.

AIMay 24, 2013
Integrating tabu search and VLSN search to develop enhanced algorithms: A case study using bipartite boolean quadratic programs

Fred Glover, Tao Ye, Abraham P. Punnen et al.

The bipartite boolean quadratic programming problem (BBQP) is a generalization of the well studied boolean quadratic programming problem. The model has a variety of real life applications; however, empirical studies of the model are not available in the literature, except in a few isolated instances. In this paper, we develop efficient heuristic algorithms based on tabu search, very large scale neighborhood (VLSN) search, and a hybrid algorithm that integrates the two. The computational study establishes that effective integration of simple tabu search with VLSN search results in superior outcomes, and suggests the value of such an integration in other settings. Complexity analysis and implementation details are provided along with conclusions drawn from experimental analysis. In addition, we obtain solutions better than the best previously known for almost all medium and large size benchmark instances.