CVApr 4

XSeg: A Large-scale X-ray Contraband Segmentation Benchmark For Real-World Security Screening

arXiv:2604.0370650.3
AI Analysis

For security screening researchers, this work provides a large-scale benchmark and an efficient annotation method to advance pixel-level contraband detection.

XSeg introduces the largest X-ray contraband segmentation dataset with 98,644 images and 295,932 instance masks across 30 categories, and proposes APSAM, a mask annotation model achieving superior performance with minimal user input.

X-ray contraband detection is critical for public safety. However, current methods primarily rely on bounding box annotations, which limit model generalization and performance due to the lack of pixel-level supervision and real-world data. To address these limitations, we introduce XSeg. To the best of our knowledge, XSeg is the largest X-ray contraband segmentation dataset to date, including 98,644 images and 295,932 instance masks, and contains the latest 30 common contraband categories. The images are sourced from public datasets and our synthesized data, filtered through a custom data cleaning pipeline to remove low-quality samples. To enable accurate and efficient annotation and reduce manual labeling effort, we propose Adaptive Point SAM (APSAM), a specialized mask annotation model built upon the Segment Anything Model (SAM). We address SAM's poor cross-domain generalization and limited capability in detecting stacked objects by introducing an Energy-Aware Encoder that enhances the initialization of the mask decoder, significantly improving sensitivity to overlapping items. Additionally, we design an Adaptive Point Generator that allows users to obtain precise mask labels with only a single coarse point prompt. Extensive experiments on XSeg demonstrate the superior performance of APSAM.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes