CRLGJun 14, 2019

Effectiveness of Distillation Attack and Countermeasure on Neural Network Watermarking

arXiv:1906.06046v134 citations
Originality Incremental advance
AI Analysis

This addresses the problem of protecting model copyrights for machine learning service providers and model owners, though it is incremental as it builds on existing watermarking techniques.

The paper tackles the vulnerability of neural network watermarks to distillation attacks, showing that distillation effectively removes watermarks, and proposes a countermeasure called ingrain that improves robustness against such attacks while maintaining performance against other transformations.

The rise of machine learning as a service and model sharing platforms has raised the need of traitor-tracing the models and proof of authorship. Watermarking technique is the main component of existing methods for protecting copyright of models. In this paper, we show that distillation, a widely used transformation technique, is a quite effective attack to remove watermark embedded by existing algorithms. The fragility is due to the fact that distillation does not retain the watermark embedded in the model that is redundant and independent to the main learning task. We design ingrain in response to the destructive distillation. It regularizes a neural network with an ingrainer model, which contains the watermark, and forces the model to also represent the knowledge of the ingrainer. Our extensive evaluations show that ingrain is more robust to distillation attack and its robustness against other widely used transformation techniques is comparable to existing methods.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes