AI CLApr 17, 2025

Antidistillation Sampling

Yash Savani, Asher Trockman, Zhili Feng, Yixuan Even Xu, Avi Schwarzschild, Alexander Robey, Marc Finzi, J. Zico Kolter

CMUStanford

arXiv:2504.13146v521.915 citationsh-index: 22

Originality Incremental advance

AI Analysis

This addresses a security vulnerability for model owners by protecting against distillation attacks, though it appears incremental as it builds on existing sampling techniques.

The paper tackles the problem of preventing unauthorized model distillation by proposing antidistillation sampling, which modifies token probabilities to poison reasoning traces, reducing distillation effectiveness while maintaining model utility.

Frontier models that generate extended reasoning traces inadvertently produce rich token sequences that can facilitate model distillation. Recognizing this vulnerability, model owners may seek sampling strategies that limit the effectiveness of distillation without compromising model performance. Antidistillation sampling provides exactly this capability. By strategically modifying a model's next-token probability distribution, antidistillation sampling poisons reasoning traces, rendering them significantly less effective for distillation while preserving the model's practical utility. For further details, see https://antidistillation.com.

View on arXiv PDF

Similar