CR LGJul 31, 2024

Vera Verto: Multimodal Hijacking Attack

Minxing Zhang, Ahmed Salem, Michael Backes, Yang Zhang

arXiv:2408.00129v14.21 citationsh-index: 17

Originality Incremental advance

AI Analysis

This addresses security vulnerabilities in machine learning pipelines for practitioners deploying multimodal models, though it is an incremental extension of existing hijacking attacks.

The paper tackles the problem of model hijacking attacks by extending them from homogeneous-modality tasks to a multimodal setting where an adversary implements a natural language processing hijacking task into an image classification model, achieving attack success rates of 94-95% on datasets like STL10, CIFAR-10, and MNIST.

The increasing cost of training machine learning (ML) models has led to the inclusion of new parties to the training pipeline, such as users who contribute training data and companies that provide computing resources. This involvement of such new parties in the ML training process has introduced new attack surfaces for an adversary to exploit. A recent attack in this domain is the model hijacking attack, whereby an adversary hijacks a victim model to implement their own -- possibly malicious -- hijacking tasks. However, the scope of the model hijacking attack is so far limited to the homogeneous-modality tasks. In this paper, we transform the model hijacking attack into a more general multimodal setting, where the hijacking and original tasks are performed on data of different modalities. Specifically, we focus on the setting where an adversary implements a natural language processing (NLP) hijacking task into an image classification model. To mount the attack, we propose a novel encoder-decoder based framework, namely the Blender, which relies on advanced image and language models. Experimental results show that our modal hijacking attack achieves strong performances in different settings. For instance, our attack achieves 94%, 94%, and 95% attack success rate when using the Sogou news dataset to hijack STL10, CIFAR-10, and MNIST classifiers.

View on arXiv PDF

Similar