AICLApr 17, 2025

Antidistillation Sampling

CMUStanford
arXiv:2504.13146v514 citationsh-index: 22
Originality Incremental advance
AI Analysis

This addresses a security vulnerability for model owners by protecting against distillation attacks, though it appears incremental as it builds on existing sampling techniques.

The paper tackles the problem of preventing unauthorized model distillation by proposing antidistillation sampling, which modifies token probabilities to poison reasoning traces, reducing distillation effectiveness while maintaining model utility.

Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes