CVAIApr 14

UNBOX: Unveiling Black-box visual models with Natural-language

arXiv:2603.0863911.7h-index: 20
Predicted impact top 70% in CV · last 90 daysOriginality Highly original
AI Analysis

For practitioners needing to audit proprietary vision APIs, UNBOX provides the first fully data-free and gradient-free interpretability method that matches white-box approaches.

UNBOX enables class-wise model dissection of black-box vision APIs using only output probabilities, producing interpretable text descriptors that reveal learned concepts and biases. It achieves competitive performance with white-box methods on ImageNet-1K, Waterbirds, and CelebA.

Ensuring trustworthiness in open-world visual recognition requires models that are interpretable, fair, and robust to distribution shifts. Yet modern vision systems are increasingly deployed as proprietary black-box APIs, exposing only output probabilities and hiding architecture, parameters, gradients, and training data. This opacity prevents meaningful auditing, bias detection, and failure analysis. Existing explanation methods assume white- or gray-box access or knowledge of the training distribution, making them unusable in these real-world settings. We introduce UNBOX, a framework for class-wise model dissection under fully data-free, gradient-free, and backpropagation-free constraints. UNBOX leverages Large Language Models and text-to-image diffusion models to recast activation maximization as a purely semantic search driven by output probabilities. The method produces human-interpretable text descriptors that maximally activate each class, revealing the concepts a model has implicitly learned, the training distribution it reflects, and potential sources of bias. We evaluate UNBOX on ImageNet-1K, Waterbirds, and CelebA through semantic fidelity tests, visual-feature correlation analyses and slice-discovery auditing. Despite operating under the strictest black-box constraints, UNBOX performs competitively with state-of-the-art white-box interpretability methods. This demonstrates that meaningful insight into a model's internal reasoning can be recovered without any internal access, enabling more trustworthy and accountable visual recognition systems.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes