ROJun 5, 2025Code
MineInsight: A Multi-sensor Dataset for Humanitarian Demining Robotics in Off-Road EnvironmentsMario Malizia, Charles Hamesse, Ken Hasselmann et al.
The use of robotics in humanitarian demining increasingly involves computer vision techniques to improve landmine detection capabilities. However, in the absence of diverse and realistic datasets, the reliable validation of algorithms remains a challenge for the research community. In this paper, we introduce MineInsight, a publicly available multi-sensor, multi-spectral dataset designed for off-road landmine detection. The dataset features 35 different targets (15 landmines and 20 commonly found objects) distributed along three distinct tracks, providing a diverse and realistic testing environment. MineInsight is, to the best of our knowledge, the first dataset to integrate dual-view sensor scans from both an Unmanned Ground Vehicle and its robotic arm, offering multiple viewpoints to mitigate occlusions and improve spatial awareness. It features two LiDARs, as well as images captured at diverse spectral ranges, including visible (RGB, monochrome), visible short-wave infrared (VIS-SWIR), and long-wave infrared (LWIR). Additionally, the dataset comes with an estimation of the location of the targets, offering a benchmark for evaluating detection algorithms. We recorded approximately one hour of data in both daylight and nighttime conditions, resulting in around 38,000 RGB frames, 53,000 VIS-SWIR frames, and 108,000 LWIR frames. MineInsight serves as a benchmark for developing and evaluating landmine detection algorithms. Our dataset is available at https://github.com/mariomlz99/MineInsight.
CVSep 25, 2025
Vision Transformers: the threat of realistic adversarial patchesKasper Cools, Clara Maathuis, Alexander M. van Oers et al.
The increasing reliance on machine learning systems has made their security a critical concern. Evasion attacks enable adversaries to manipulate the decision-making processes of AI systems, potentially causing security breaches or misclassification of targets. Vision Transformers (ViTs) have gained significant traction in modern machine learning due to increased 1) performance compared to Convolutional Neural Networks (CNNs) and 2) robustness against adversarial perturbations. However, ViTs remain vulnerable to evasion attacks, particularly to adversarial patches, unique patterns designed to manipulate AI classification systems. These vulnerabilities are investigated by designing realistic adversarial patches to cause misclassification in person vs. non-person classification tasks using the Creases Transformation (CT) technique, which adds subtle geometric distortions similar to those occurring naturally when wearing clothing. This study investigates the transferability of adversarial attack techniques used in CNNs when applied to ViT classification models. Experimental evaluation across four fine-tuned ViT models on a binary person classification task reveals significant vulnerability variations: attack success rates ranged from 40.04% (google/vit-base-patch16-224-in21k) to 99.97% (facebook/dino-vitb16), with google/vit-base-patch16-224 achieving 66.40% and facebook/dinov3-vitb16 reaching 65.17%. These results confirm the cross-architectural transferability of adversarial patches from CNNs to ViTs, with pre-training dataset scale and methodology strongly influencing model resilience to adversarial attacks.