CVAug 30, 2025

Make me an Expert: Distilling from Generalist Black-Box Models into Specialized Models for Semantic Segmentation

arXiv:2509.00509v1h-index: 30Has Code
Originality Incremental advance
AI Analysis

This work addresses the challenge of model adaptation under realistic constraints for users of AI services, though it is incremental as it builds on existing distillation and attention techniques.

The paper tackles the problem of training local models using black-box AIaaS APIs that only provide one-hot predictions, by introducing the Black-Box Distillation setting and a method called ATGC that dynamically selects optimal input scales to address resolution sensitivity, achieving substantial improvements in semantic segmentation across multiple datasets.

The rise of Artificial Intelligence as a Service (AIaaS) democratizes access to pre-trained models via Application Programming Interfaces (APIs), but also raises a fundamental question: how can local models be effectively trained using black-box models that do not expose their weights, training data, or logits, a constraint in which current domain adaptation paradigms are impractical ? To address this challenge, we introduce the Black-Box Distillation (B2D) setting, which enables local model adaptation under realistic constraints: (1) the API model is open-vocabulary and trained on large-scale general-purpose data, and (2) access is limited to one-hot predictions only. We identify that open-vocabulary models exhibit significant sensitivity to input resolution, with different object classes being segmented optimally at different scales, a limitation termed the "curse of resolution". Our method, ATtention-Guided sCaler (ATGC), addresses this challenge by leveraging DINOv2 attention maps to dynamically select optimal scales for black-box model inference. ATGC scores the attention maps with entropy to identify informative scales for pseudo-labelling, enabling effective distillation. Experiments demonstrate substantial improvements under black-box supervision across multiple datasets while requiring only one-hot API predictions. Our code is available at https://github.com/yasserben/ATGC.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes