CVApr 28, 2025

Shopformer: Transformer-Based Framework for Detecting Shoplifting via Human Pose

arXiv:2504.19970v16 citationsh-index: 10Has Code2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)
Originality Incremental advance
AI Analysis

It addresses the costly problem of shoplifting for retailers with a privacy-preserving and scalable solution, though it is incremental as it builds on existing transformer and pose analysis methods.

The paper tackles shoplifting detection in retail by introducing Shopformer, a transformer-based model that analyzes pose sequences instead of raw video, outperforming state-of-the-art anomaly detection models on real-world data.

Shoplifting remains a costly issue for the retail sector, but traditional surveillance systems, which are mostly based on human monitoring, are still largely ineffective, with only about 2% of shoplifters being arrested. Existing AI-based approaches rely on pixel-level video analysis which raises privacy concerns, is sensitive to environmental variations, and demands significant computational resources. To address these limitations, we introduce Shopformer, a novel transformer-based model that detects shoplifting by analyzing pose sequences rather than raw video. We propose a custom tokenization strategy that converts pose sequences into compact embeddings for efficient transformer processing. To the best of our knowledge, this is the first pose-sequence-based transformer model for shoplifting detection. Evaluated on real-world pose data, our method outperforms state-of-the-art anomaly detection models, offering a privacy-preserving, and scalable solution for real-time retail surveillance. The code base for this work is available at https://github.com/TeCSAR-UNCC/Shopformer.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes