CVAug 15, 2024Code
Beyond Full Labels: Energy-Double-Guided Single-Point Prompt for Infrared Small Target Label GenerationShuai Yuan, Hanlin Qin, Renke Kou et al.
We pioneer a learning-based single-point prompt paradigm for infrared small target label generation (IRSTLG) to lobber annotation burdens. Unlike previous clustering-based methods, our intuition is that point-guided mask generation just requires one more prompt than target detection, i.e., IRSTLG can be treated as an infrared small target detection (IRSTD) with the location hint. Therefore, we propose an elegant yet effective Energy-Double-Guided Single-point Prompt (EDGSP) framework, aiming to adeptly transform a coarse IRSTD network into a refined label generation method. Specifically, EDGSP comprises three key modules: 1) target energy initialization (TEI), which establishes a foundational outline to streamline the mapping process for effective shape evolution, 2) double prompt embedding (DPE) for rapidly localizing interesting regions and reinforcing high-resolution individual edges to avoid label adhesion, and 3) bounding box-based matching (BBM) for eliminating false masks via considering comprehensive cluster boundary conditions to obtain a reliable output. In this way, pseudo labels generated by three backbones equipped with our EDGSP achieve 100% object-level probability of detection (Pd) and 0% false-alarm rate (Fa) on SIRST, NUDT-SIRST, and IRSTD-1k datasets, with a pixel-level intersection over union (IoU) improvement of 13.28% over state-of-the-art (SOTA) label generation methods. Further applying our inferred masks to train detection models, EDGSP, for the first time, enables a single-point-generated pseudo mask to surpass the manual labels. Even with coarse single-point annotations, it still achieves 99.5% performance of full labeling. Code is available at https://github.com/xdFai/EDGSP.
CVJan 28, 2024Code
ASCNet: Asymmetric Sampling Correction Network for Infrared Image DestripingShuai Yuan, Hanlin Qin, Xiang Yan et al.
In a real-world infrared imaging system, effectively learning a consistent stripe noise removal model is essential. Most existing destriping methods cannot precisely reconstruct images due to cross-level semantic gaps and insufficient characterization of the global column features. To tackle this problem, we propose a novel infrared image destriping method, called Asymmetric Sampling Correction Network (ASCNet), that can effectively capture global column relationships and embed them into a U-shaped framework, providing comprehensive discriminative representation and seamless semantic connectivity. Our ASCNet consists of three core elements: Residual Haar Discrete Wavelet Transform (RHDWT), Pixel Shuffle (PS), and Column Non-uniformity Correction Module (CNCM). Specifically, RHDWT is a novel downsampler that employs double-branch modeling to effectively integrate stripe-directional prior knowledge and data-driven semantic interaction to enrich the feature representation. Observing the semantic patterns crosstalk of stripe noise, PS is introduced as an upsampler to prevent excessive apriori decoding and performing semantic-bias-free image reconstruction. After each sampling, CNCM captures the column relationships in long-range dependencies. By incorporating column, spatial, and self-dependence information, CNCM well establishes a global context to distinguish stripes from the scene's vertical structures. Extensive experiments on synthetic data, real data, and infrared small target detection tasks demonstrate that the proposed method outperforms state-of-the-art single-image destriping methods both visually and quantitatively. Our code will be made publicly available at https://github.com/xdFai/ASCNet.
CVOct 28, 2018
Object Tracking in Hyperspectral Videos with Convolutional Features and Kernelized Correlation FilterKun Qian, Jun Zhou, Fengchao Xiong et al.
Target tracking in hyperspectral videos is a new research topic. In this paper, a novel method based on convolutional network and Kernelized Correlation Filter (KCF) framework is presented for tracking objects of interest in hyperspectral videos. We extract a set of normalized three-dimensional cubes from the target region as fixed convolution filters which contain spectral information surrounding a target. The feature maps generated by convolutional operations are combined to form a three-dimensional representation of an object, thereby providing effective encoding of local spectral-spatial information. We show that a simple two-layer convolutional networks is sufficient to learn robust representations without the need of offline training with a large dataset. In the tracking step, KCF is adopted to distinguish targets from neighboring environment. Experimental results demonstrate that the proposed method performs well on sample hyperspectral videos, and outperforms several state-of-the-art methods tested on grayscale and color videos in the same scene.