Imitating the Functionality of Image-to-Image Models Using a Single Example
This exposes a vulnerability in commercial AI models, potentially enabling unauthorized replication, which is a security concern for companies investing in confidential models.
The paper tackles the problem of imitating proprietary image-to-image translation models when only a few input-output examples are available, finding that a single example often suffices using a simple distillation approach, with extensive experiments across various architectures and tasks.
We study the possibility of imitating the functionality of an image-to-image translation model by observing input-output pairs. We focus on cases where training the model from scratch is impossible, either because training data are unavailable or because the model architecture is unknown. This is the case, for example, with commercial models for biological applications. Since the development of these models requires large investments, their owners commonly keep them confidential, and reveal only a few input-output examples on the company's website or in an academic paper. Surprisingly, we find that even a single example typically suffices for learning to imitate the model's functionality, and that this can be achieved using a simple distillation approach. We present an extensive ablation study encompassing a wide variety of model architectures, datasets and tasks, to characterize the factors affecting vulnerability to functionality imitation, and provide a preliminary theoretical discussion on the reasons for this unwanted behavior.