LGAICROct 11, 2024

The Good, the Bad and the Ugly: Watermarks, Transferable Attacks and Adversarial Defenses

arXiv:2410.08864v1
Originality Incremental advance
AI Analysis

This work addresses the fundamental trade-offs in securing machine learning models against attacks, with implications for AI safety and robustness, though it is theoretical and incremental in formalizing existing concepts.

The paper formalizes backdoor-based watermarks and adversarial defenses as interactive protocols, showing that for almost every discriminative learning task, at least one of these or a transferable attack exists, with proofs using homomorphic encryption and linking transferable attacks to cryptographic primitives.

We formalize and extend existing definitions of backdoor-based watermarks and adversarial defenses as interactive protocols between two players. The existence of these schemes is inherently tied to the learning tasks for which they are designed. Our main result shows that for almost every discriminative learning task, at least one of the two -- a watermark or an adversarial defense -- exists. The term "almost every" indicates that we also identify a third, counterintuitive but necessary option, i.e., a scheme we call a transferable attack. By transferable attack, we refer to an efficient algorithm computing queries that look indistinguishable from the data distribution and fool all efficient defenders. To this end, we prove the necessity of a transferable attack via a construction that uses a cryptographic tool called homomorphic encryption. Furthermore, we show that any task that satisfies our notion of a transferable attack implies a cryptographic primitive, thus requiring the underlying task to be computationally complex. These two facts imply an "equivalence" between the existence of transferable attacks and cryptography. Finally, we show that the class of tasks of bounded VC-dimension has an adversarial defense, and a subclass of them has a watermark.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes