CR CL LGMay 12, 2023

Two-in-One: A Model Hijacking Attack Against Text Generation Models

Wai Man Si, Michael Backes, Yang Zhang, Ahmed Salem

arXiv:2305.07406v115.229 citations

Originality Incremental advance

AI Analysis

This work addresses security risks for users of text generation models by demonstrating a broader applicability of hijacking attacks, though it is incremental as it extends an existing attack to new domains.

The paper tackles the problem of model hijacking attacks by extending them from image classification to text generation and classification models, proposing Ditto, which successfully hijacks models for tasks like language translation and summarization without compromising utility.

Machine learning has progressed significantly in various applications ranging from face recognition to text generation. However, its success has been accompanied by different attacks. Recently a new attack has been proposed which raises both accountability and parasitic computing risks, namely the model hijacking attack. Nevertheless, this attack has only focused on image classification tasks. In this work, we broaden the scope of this attack to include text generation and classification models, hence showing its broader applicability. More concretely, we propose a new model hijacking attack, Ditto, that can hijack different text classification tasks into multiple generation ones, e.g., language translation, text summarization, and language modeling. We use a range of text benchmark datasets such as SST-2, TweetEval, AGnews, QNLI, and IMDB to evaluate the performance of our attacks. Our results show that by using Ditto, an adversary can successfully hijack text generation models without jeopardizing their utility.

View on arXiv PDF

Similar