CVDec 5, 2025

SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling

arXiv:2512.06185v11 citations
Originality Incremental advance
AI Analysis

This addresses the persistent fragility of modern deep classifiers for AI safety and robustness, though it is incremental as it builds on prior fooling image work.

The paper tackled the problem of deep neural networks being overconfident on out-of-distribution inputs by introducing SPOOF, a simple black-box attack that generates high-confidence fooling images with minimal pixel modifications and reduced compute, showing that even state-of-the-art networks like ViT-B/16 remain susceptible.

Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images" work introduced by Nguyen et al. (2015), we re-implement both CPPN-based and direct-encoding-based evolutionary fooling attacks on modern architectures, including convolutional and transformer classifiers. Our re-implementation confirm that high-confidence fooling persists even in state-of-the-art networks, with transformer-based ViT-B/16 emerging as the most susceptible--achieving near-certain misclassifications with substantially fewer queries than convolution-based models. We then introduce SPOOF, a minimalist, consistent, and more efficient black-box attack generating high-confidence fooling images. Despite its simplicity, SPOOF generates unrecognizable fooling images with minimal pixel modifications and drastically reduced compute. Furthermore, retraining with fooling images as an additional class provides only partial resistance, as SPOOF continues to fool consistently with slightly higher query budgets--highlighting persistent fragility of modern deep classifiers.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes