CVOct 21, 2022

Distilling the Undistillable: Learning from a Nasty Teacher

Surgan Jandial, Yash Khasbage, Arghya Pal, Vineeth N Balasubramanian, Balaji Krishnamurthy

arXiv:2210.11728v16.57 citationsh-index: 37Has Code

Originality Incremental advance

AI Analysis

This work addresses security vulnerabilities in machine learning for practitioners concerned with model confidentiality, though it is incremental as it builds on existing defense methods.

The paper tackles the problem of bypassing the 'Nasty Teacher' defense in knowledge distillation to steal information, achieving up to 68.63% increased learning on standard datasets.

The inadvertent stealing of private/sensitive information using Knowledge Distillation (KD) has been getting significant attention recently and has guided subsequent defense efforts considering its critical nature. Recent work Nasty Teacher proposed to develop teachers which can not be distilled or imitated by models attacking it. However, the promise of confidentiality offered by a nasty teacher is not well studied, and as a further step to strengthen against such loopholes, we attempt to bypass its defense and steal (or extract) information in its presence successfully. Specifically, we analyze Nasty Teacher from two different directions and subsequently leverage them carefully to develop simple yet efficient methodologies, named as HTC and SCM, which increase the learning from Nasty Teacher by upto 68.63% on standard datasets. Additionally, we also explore an improvised defense method based on our insights of stealing. Our detailed set of experiments and ablations on diverse models/settings demonstrate the efficacy of our approach.

View on arXiv PDF Code

Similar