CVJul 29, 2024Code
Towards Robust Infrared Small Target Detection: A Feature-Enhanced and Sensitivity-Tunable FrameworkJinmiao Zhao, Zelin Shi, Chuang Yu et al.
Recently, single-frame infrared small target (SIRST) detection technology has attracted widespread attention. Different from most existing deep learning-based methods that focus on improving network architectures, we propose a feature-enhanced and sensitivity-tunable (FEST) framework, which is compatible with existing SIRST detection networks and further enhances their detection performance. The FEST framework improves the model's robustness from two aspects: feature enhancement and target confidence regulation. For feature enhancement, we employ a multi-scale fusion strategy to improve the model's perception to multi-scale features of multi-size targets, and design an edge enhancement difficulty mining (EEDM) loss to guide the network to continuously focus on challenging target regions and edge features during training. For target confidence regulation, an adjustable sensitivity (AS) strategy is proposed for network post-processing. This strategy enhances the model's adaptability in complex scenarios and significantly improves the detection rate of infrared small targets while maintaining segmentation accuracy. Extensive experimental results show that our FEST framework can effectively enhance the performance of existing SIRST detection networks. The code is available at https://github.com/YuChuang1205/FEST-Framework
CVDec 10, 2025Code
Gradient-Guided Learning Network for Infrared Small Target DetectionJinmiao Zhao, Chuang Yu, Zelin Shi et al.
Recently, infrared small target detection has attracted extensive attention. However, due to the small size and the lack of intrinsic features of infrared small targets, the existing methods generally have the problem of inaccurate edge positioning and the target is easily submerged by the background. Therefore, we propose an innovative gradient-guided learning network (GGL-Net). Specifically, we are the first to explore the introduction of gradient magnitude images into the deep learning-based infrared small target detection method, which is conducive to emphasizing the edge details and alleviating the problem of inaccurate edge positioning of small targets. On this basis, we propose a novel dual-branch feature extraction network that utilizes the proposed gradient supplementary module (GSM) to encode raw gradient information into deeper network layers and embeds attention mechanisms reasonably to enhance feature extraction ability. In addition, we construct a two-way guidance fusion module (TGFM), which fully considers the characteristics of feature maps at different levels. It can facilitate the effective fusion of multi-scale feature maps and extract richer semantic information and detailed information through reasonable two-way guidance. Extensive experiments prove that GGL-Net has achieves state-of-the-art results on the public real NUAA-SIRST dataset and the public synthetic NUDT-SIRST dataset. Our code has been integrated into https://github.com/YuChuang1205/MSDA-Net
CVAug 5, 2024
LR-Net: A Lightweight and Robust Network for Infrared Small Target DetectionChuang Yu, Yunpeng Liu, Jinmiao Zhao et al.
Limited by equipment limitations and the lack of target intrinsic features, existing infrared small target detection methods have difficulty meeting actual comprehensive performance requirements. Therefore, we propose an innovative lightweight and robust network (LR-Net), which abandons the complex structure and achieves an effective balance between detection accuracy and resource consumption. Specifically, to ensure the lightweight and robustness, on the one hand, we construct a lightweight feature extraction attention (LFEA) module, which can fully extract target features and strengthen information interaction across channels. On the other hand, we construct a simple refined feature transfer (RFT) module. Compared with direct cross-layer connections, the RFT module can improve the network's feature refinement extraction capability with little resource consumption. Meanwhile, to solve the problem of small target loss in high-level feature maps, on the one hand, we propose a low-level feature distribution (LFD) strategy to use low-level features to supplement the information of high-level features. On the other hand, we introduce an efficient simplified bilinear interpolation attention module (SBAM) to promote the guidance constraints of low-level features on high-level features and the fusion of the two. In addition, We abandon the traditional resizing method and adopt a new training and inference cropping strategy, which is more robust to datasets with multi-scale samples. Extensive experimental results show that our LR-Net achieves state-of-the-art (SOTA) performance. Notably, on the basis of the proposed LR-Net, we achieve 3rd place in the "ICPR 2024 Resource-Limited Infrared Small Target Detection Challenge Track 2: Lightweight Infrared Small Target Detection".
CVAug 5, 2024
Refined Infrared Small Target Detection Scheme with Single-Point SupervisionJinmiao Zhao, Zelin Shi, Chuang Yu et al.
Recently, infrared small target detection with single-point supervision has attracted extensive attention. However, the detection accuracy of existing methods has difficulty meeting actual needs. Therefore, we propose an innovative refined infrared small target detection scheme with single-point supervision, which has excellent segmentation accuracy and detection rate. Specifically, we introduce label evolution with single point supervision (LESPS) framework and explore the performance of various excellent infrared small target detection networks based on this framework. Meanwhile, to improve the comprehensive performance, we construct a complete post-processing strategy. On the one hand, to improve the segmentation accuracy, we use a combination of test-time augmentation (TTA) and conditional random field (CRF) for post-processing. On the other hand, to improve the detection rate, we introduce an adjustable sensitivity (AS) strategy for post-processing, which fully considers the advantages of multiple detection results and reasonably adds some areas with low confidence to the fine segmentation image in the form of centroid points. In addition, to further improve the performance and explore the characteristics of this task, on the one hand, we construct and find that a multi-stage loss is helpful for fine-grained detection. On the other hand, we find that a reasonable sliding window cropping strategy for test samples has better performance for actual multi-size samples. Extensive experimental results show that the proposed scheme achieves state-of-the-art (SOTA) performance. Notably, the proposed scheme won the third place in the "ICPR 2024 Resource-Limited Infrared Small Target Detection Challenge Track 1: Weakly Supervised Infrared Small Target Detection".
CVDec 15, 2024Code
From Easy to Hard: Progressive Active Learning Framework for Infrared Small Target Detection with Single Point SupervisionChuang Yu, Jinmiao Zhao, Yunpeng Liu et al.
Recently, single-frame infrared small target (SIRST) detection with single point supervision has drawn wide-spread attention. However, the latest label evolution with single point supervision (LESPS) framework suffers from instability, excessive label evolution, and difficulty in exerting embedded network performance. Inspired by organisms gradually adapting to their environment and continuously accumulating knowledge, we construct an innovative Progressive Active Learning (PAL) framework for single point supervision, which drives the existing SIRST detection networks progressively and actively recognizes and learns more hard samples to achieve significant performance improvements. Specifically, to avoid the early low-performance model leading to the wrong selection of hard samples, we propose a model pre-start concept, which focuses on automatically selecting a portion of easy samples and helping the model have basic task-specific learning capabilities. Meanwhile, we propose a refined dual-update strategy, which can promote reasonable learning of harder samples and continuous refinement of pseudo-labels. In addition, to alleviate the risk of excessive label evolution, a decay factor is reasonably introduced, which helps to achieve a dynamic balance between the expansion and contraction of target annotations. Extensive experiments show that existing SIRST detection networks equipped with our PAL framework have achieved state-of-the-art (SOTA) results on multiple public datasets. Furthermore, our PAL framework can build an efficient and stable bridge between full supervision and single point supervision tasks. Our code are available at https://github.com/YuChuang1205/PAL.
AIDec 5, 2025Code
MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large ModelsChuang Yu, Jinmiao Zhao, Mingxuan Zhao et al.
Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robustness, and are susceptible to misleading interpretations in complex scenarios. Therefore, we propose a Multi-rationale INtegrated Discriminative (MIND) reasoning framework, which is designed to endow MLLMs with human-like cognitive abilities of "Understand -> Rethink -> Correct", and achieves a paradigm evolution from passive imitation-based reasoning to active discriminative reasoning. Specifically, we introduce a Rationale Augmentation and Discrimination (RAD) paradigm, which automatically and efficiently expands existing datasets by generating diverse rationales, providing a unified and extensible data foundation. Meanwhile, we design a Progressive Two-stage Correction Learning (P2CL) strategy. The first phase enhances multi-rationale positive learning, while the second phase enables active logic discrimination and correction. In addition, to mitigate representation entanglement in the multi-rationale semantic space, we propose a Multi-rationale Contrastive Alignment (MCA) optimization strategy, which achieves semantic aggregation of correct reasoning and boundary separation of incorrect reasoning. Extensive experiments demonstrate that the proposed MIND reasoning framework achieves state-of-the-art (SOTA) performance on multiple public datasets covering scientific, commonsense, and mathematical scenarios. It provides a new perspective for advancing MLLMs towards higher levels of cognitive intelligence. Our code is available at https://github.com/YuChuang1205/MIND
CVDec 5, 2025Code
Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient ParadigmChuang Yu, Jinmiao Zhao, Yunpeng Liu et al.
While large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, their potential for single-frame infrared small target (SIRST) detection remains largely unexplored. To fill this gap, we systematically introduce the frozen representations from VFMs into the SIRST task for the first time and propose a Foundation-Driven Efficient Paradigm (FDEP), which can seamlessly adapt to existing encoder-decoder-based methods and significantly improve accuracy without additional inference overhead. Specifically, a Semantic Alignment Modulation Fusion (SAMF) module is designed to achieve dynamic alignment and deep fusion of the global semantic priors from VFMs with task-specific features. Meanwhile, to avoid the inference time burden introduced by VFMs, we propose a Collaborative Optimization-based Implicit Self-Distillation (CO-ISD) strategy, which enables implicit semantic transfer between the main and lightweight branches through parameter sharing and synchronized backpropagation. In addition, to unify the fragmented evaluation system, we construct a Holistic SIRST Evaluation (HSE) metric that performs multi-threshold integral evaluation at both pixel-level confidence and target-level robustness, providing a stable and comprehensive basis for fair model comparison. Extensive experiments demonstrate that the SIRST detection networks equipped with our FDEP framework achieve state-of-the-art (SOTA) performance on multiple public datasets. Our code is available at https://github.com/YuChuang1205/FDEP-Framework
CVJun 4, 2024Code
Multi-Scale Direction-Aware Network for Infrared Small Target DetectionJinmiao Zhao, Zelin Shi, Chuang Yu et al.
Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on edge and shape features, but ignore the richer structural differences and detailed information embedded in high-frequency components from different directions, thereby failing to fully exploit the value of high-frequency directional features in target perception. To address this limitation, we propose a multi-scale direction-aware network (MSDA-Net), which is the first attempt to integrate the high-frequency directional features of infrared small targets as domain prior knowledge into neural networks. Specifically, to fully mine the high-frequency directional features, on the one hand, a high-frequency direction injection (HFDI) module without trainable parameters is constructed to inject the high-frequency directional information of the original image into the network. On the other hand, a multi-scale direction-aware (MSDA) module is constructed, which promotes the full extraction of local relations at different scales and the full perception of key features in different directions. In addition, considering the characteristics of infrared small targets, we construct a feature aggregation (FA) structure to address target disappearance in high-level feature maps, and a feature calibration fusion (FCF) module to alleviate feature bias during cross-layer feature fusion. Extensive experimental results show that our MSDA-Net achieves state-of-the-art (SOTA) results on multiple public datasets. The code can be available at https://github.com/YuChuang1205/MSDA-Net
CVDec 15, 2024Code
Why and How: Knowledge-Guided Learning for Cross-Spectral Image Patch MatchingChuang Yu, Yunpeng Liu, Jinmiao Zhao et al.
Recently, cross-spectral image patch matching based on feature relation learning has attracted extensive attention. However, performance bottleneck problems have gradually emerged in existing methods. To address this challenge, we make the first attempt to explore a stable and efficient bridge between descriptor learning and metric learning, and construct a knowledge-guided learning network (KGL-Net), which achieves amazing performance improvements while abandoning complex network structures. Specifically, we find that there is feature extraction consistency between metric learning based on feature difference learning and descriptor learning based on Euclidean distance. This provides the foundation for bridge building. To ensure the stability and efficiency of the constructed bridge, on the one hand, we conduct an in-depth exploration of 20 combined network architectures. On the other hand, a feature-guided loss is constructed to achieve mutual guidance of features. In addition, unlike existing methods, we consider that the feature mapping ability of the metric branch should receive more attention. Therefore, a hard negative sample mining for metric learning (HNSM-M) strategy is constructed. To the best of our knowledge, this is the first time that hard negative sample mining for metric networks has been implemented and brings significant performance gains. Extensive experimental results show that our KGL-Net achieves SOTA performance in three different cross-spectral image patch matching scenarios. Our code are available at https://github.com/YuChuang1205/KGL-Net.
CVMar 18, 2024Code
Relational Representation Learning Network for Cross-Spectral Image Patch MatchingChuang Yu, Yunpeng Liu, Jinmiao Zhao et al.
Recently, feature relation learning has drawn widespread attention in cross-spectral image patch matching. However, existing related research focuses on extracting diverse relations between image patch features and ignores sufficient intrinsic feature representations of individual image patches. Therefore, we propose an innovative relational representation learning idea that simultaneously focuses on sufficiently mining the intrinsic features of individual image patches and the relations between image patch features. Based on this, we construct a Relational Representation Learning Network (RRL-Net). Specifically, we innovatively construct an autoencoder to fully characterize the individual intrinsic features, and introduce a feature interaction learning (FIL) module to extract deep-level feature relations. To further fully mine individual intrinsic features, a lightweight multi-dimensional global-to-local attention (MGLA) module is constructed to enhance the global feature extraction of individual image patches and capture local dependencies within global features. By combining the MGLA module, we further explore the feature extraction network and construct an attention-based lightweight feature extraction (ALFE) network. In addition, we propose a multi-loss post-pruning (MLPP) optimization strategy, which greatly promotes network optimization while avoiding increases in parameters and inference time. Extensive experiments demonstrate that our RRL-Net achieves state-of-the-art (SOTA) performance on multiple public datasets. Our code are available at https://github.com/YuChuang1205/RRL-Net.