CVDec 3, 2025

Multi-Scale Visual Prompting for Lightweight Small-Image Classification

arXiv:2512.03663v12 citationsh-index: 3
Originality Synthesis-oriented
AI Analysis

This work addresses a gap in visual prompting for lightweight small-image classification, which is important for education, prototyping, and research, but it is incremental as it extends an existing paradigm to new data.

The paper tackled the problem of adapting visual prompting to small-image benchmarks like MNIST, Fashion-MNIST, and CIFAR-10, which had been overlooked, and introduced Multi-Scale Visual Prompting (MSVP) that adds less than 0.02% parameters and significantly improves performance across various backbones.

Visual prompting has recently emerged as an efficient strategy to adapt vision models using lightweight, learnable parameters injected into the input space. However, prior work mainly targets large Vision Transformers and high-resolution datasets such as ImageNet. In contrast, small-image benchmarks like MNIST, Fashion-MNIST, and CIFAR-10 remain widely used in education, prototyping, and research, yet have received little attention in the context of prompting. In this paper, we introduce \textbf{Multi-Scale Visual Prompting (MSVP)}, a simple and generic module that learns a set of global, mid-scale, and local prompt maps fused with the input image via a lightweight $1 \times 1$ convolution. MSVP is backbone-agnostic, adds less than $0.02\%$ parameters, and significantly improves performance across CNN and Vision Transformer backbones. We provide a unified benchmark on MNIST, Fashion-MNIST, and CIFAR-10 using a simple CNN, ResNet-18, and a small Vision Transformer. Our method yields consistent improvements with negligible computational overhead. We further include ablations on prompt scales, fusion strategies, and backbone architectures, along with qualitative analyzes using prompt visualizations and Grad-CAM. Our results demonstrate that multi-scale prompting provides an effective inductive bias even on low-resolution images.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes