Yunda Sun

CV
h-index18
6papers
161citations
Novelty46%
AI Score38

6 Papers

CVNov 19, 2025Code
Taming Generative Synthetic Data for X-ray Prohibited Item Detection

Jialong Sun, Hongguang Zhu, Weizhe Liu et al.

Training prohibited item detection models requires a large amount of X-ray security images, but collecting and annotating these images is time-consuming and laborious. To address data insufficiency, X-ray security image synthesis methods composite images to scale up datasets. However, previous methods primarily follow a two-stage pipeline, where they implement labor-intensive foreground extraction in the first stage and then composite images in the second stage. Such a pipeline introduces inevitable extra labor cost and is not efficient. In this paper, we propose a one-stage X-ray security image synthesis pipeline (Xsyn) based on text-to-image generation, which incorporates two effective strategies to improve the usability of synthetic images. The Cross-Attention Refinement (CAR) strategy leverages the cross-attention map from the diffusion model to refine the bounding box annotation. The Background Occlusion Modeling (BOM) strategy explicitly models background occlusion in the latent space to enhance imaging complexity. To the best of our knowledge, compared with previous methods, Xsyn is the first to achieve high-quality X-ray security image synthesis without extra labor cost. Experiments demonstrate that our method outperforms all previous methods with 1.2% mAP improvement, and the synthetic images generated by our method are beneficial to improve prohibited item detection performance across various X-ray security datasets and detectors. Code is available at https://github.com/pILLOW-1/Xsyn/.

CVNov 30, 2024
BGM: Background Mixup for X-ray Prohibited Items Detection

Weizhe Liu, Renshuai Tao, Hongguang Zhu et al.

Current data-driven approaches for X-ray prohibited items detection remain under-explored, particularly in the design of effective data augmentations. Existing natural image augmentations for reflected light imaging neglect the data characteristics of X-ray security images. Moreover, prior X-ray augmentation methods have predominantly focused on foreground prohibited items, overlooking informative background cues. In this paper, we propose Background Mixup (BGM), a background-based augmentation technique tailored for X-ray security imaging domain. Unlike conventional methods, BGM is founded on an in-depth analysis of physical properties including: 1) X-ray Transmission Imagery: Transmitted X-ray pixels represent composite information from multiple materials along the imaging path. 2) Material-based Pseudo-coloring: Pseudo-coloring in X-ray images correlates directly with material properties, aiding in material distinction. Building upon the above insights, BGM mixes background patches across regions on both 1) texture structure and 2) material variation, to benefit models from complicated background cues. This enhances the model's capability to handle domain-specific challenges such as occlusion-induced discriminative imbalance. Importantly, BGM is orthogonal and fully compatible with existing foreground-focused augmentation techniques, enabling joint use to further enhance detection performance. Extensive experiments on multiple X-ray security benchmarks show that BGM consistently surpasses strong baselines, without additional annotations or significant training overhead. This work pioneers the exploration of background-aware augmentation in X-ray prohibited items detection and provides a lightweight, plug-and-play solution with broad applicability.

CVOct 14, 2020
Towards Optimal Filter Pruning with Balanced Performance and Pruning Speed

Dong Li, Sitong Chen, Xudong Liu et al.

Filter pruning has drawn more attention since resource constrained platform requires more compact model for deployment. However, current pruning methods suffer either from the inferior performance of one-shot methods, or the expensive time cost of iterative training methods. In this paper, we propose a balanced filter pruning method for both performance and pruning speed. Based on the filter importance criteria, our method is able to prune a layer with approximate layer-wise optimal pruning rate at preset loss variation. The network is pruned in the layer-wise way without the time consuming prune-retrain iteration. If a pre-defined pruning rate for the entire network is given, we also introduce a method to find the corresponding loss variation threshold with fast converging speed. Moreover, we propose the layer group pruning and channel selection mechanism for channel alignment in network with short connections. The proposed pruning method is widely applicable to common architectures and does not involve any additional training except the final fine-tuning. Comprehensive experiments show that our method outperforms many state-of-the-art approaches.

CVMar 17, 2020
Feedback Graph Convolutional Network for Skeleton-based Action Recognition

Hao Yang, Dan Yan, Li Zhang et al.

Skeleton-based action recognition has attracted considerable attention in computer vision since skeleton data is more robust to the dynamic circumstance and complicated background than other modalities. Recently, many researchers have used the Graph Convolutional Network (GCN) to model spatial-temporal features of skeleton sequences by an end-to-end optimization. However, conventional GCNs are feedforward networks which are impossible for low-level layers to access semantic information in the high-level layers. In this paper, we propose a novel network, named Feedback Graph Convolutional Network (FGCN). This is the first work that introduces the feedback mechanism into GCNs and action recognition. Compared with conventional GCNs, FGCN has the following advantages: (1) a multi-stage temporal sampling strategy is designed to extract spatial-temporal features for action recognition in a coarse-to-fine progressive process; (2) A dense connections based Feedback Graph Convolutional Block (FGCB) is proposed to introduce feedback connections into the GCNs. It transmits the high-level semantic features to the low-level layers and flows temporal information stage by stage to progressively model global spatial-temporal features for action recognition; (3) The FGCN model provides early predictions. In the early stages, the model receives partial information about actions. Naturally, its predictions are relatively coarse. The coarse predictions are treated as the prior to guide the feature learning of later stages for a accurate prediction. Extensive experiments on the datasets, NTU-RGB+D, NTU-RGB+D120 and Northwestern-UCLA, demonstrate that the proposed FGCN is effective for action recognition. It achieves the state-of-the-art performance on the three datasets.

CVSep 9, 2019
Saliency based Semi-supervised Learning for Orbiting Satellite Tracking

Peizhuo Li, Yunda Sun, Xue Wan

The trajectory and boundary of an orbiting satellite are fundamental information for on-orbit repairing and manipulation by space robots. This task, however, is challenging owing to the freely and rapidly motion of on-orbiting satellites, the quickly varying background and the sudden change in illumination conditions. Traditional tracking usually relies on a single bounding box of the target object, however, more detailed information should be provided by visual tracking such as binary mask. In this paper, we proposed a SSLT (Saliency-based Semi-supervised Learning for Tracking) algorithm that provides both the bounding box and segmentation binary mask of target satellites at 12 frame per second without requirement of annotated data. Our method, SSLT, improves the segmentation performance by generating a saliency map based semi-supervised on-line learning approach within the initial bounding box estimated by tracking. Once a customized segmentation model has been trained, the bounding box and satellite trajectory will be refined using the binary segmentation result. Experiment using real on-orbit rendezvous and docking video from NASA (Nation Aeronautics and Space Administration), simulated satellite animation sequence from ESA (European Space Agency) and image sequences of 3D printed satellite model took in our laboratory demonstrate the robustness, versatility and fast speed of our method compared to state-of-the-art tracking and segmentation methods. Our dataset will be released for academic use in future.

CVJul 26, 2019
MVB: A Large-Scale Dataset for Baggage Re-Identification and Merged Siamese Networks

Zhulin Zhang, Dong Li, Jinhua Wu et al.

In this paper, we present a novel dataset named MVB (Multi View Baggage) for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other. Moreover, we proposed a merged Siamese network as baseline model and evaluated its performance. Experiments and case study are conducted on MVB.