CLAug 26, 2024

TF-Attack: Transferable and Fast Adversarial Attacks on Large Language Models

arXiv:2408.13985v312 citationsh-index: 10
Originality Highly original
AI Analysis

This addresses the need for more effective and efficient adversarial attacks on LLMs, which is incremental as it builds on existing attack methodologies.

The paper tackles the problem of limited transferability and inefficiency in adversarial attacks on large language models (LLMs) by proposing TF-Attack, which uses an external LLM as an overseer and parallel substitutions, resulting in up to 20 times faster attacks while surpassing previous methods in transferability.

With the great advancements in large language models (LLMs), adversarial attacks against LLMs have recently attracted increasing attention. We found that pre-existing adversarial attack methodologies exhibit limited transferability and are notably inefficient, particularly when applied to LLMs. In this paper, we analyze the core mechanisms of previous predominant adversarial attack methods, revealing that 1) the distributions of importance score differ markedly among victim models, restricting the transferability; 2) the sequential attack processes induces substantial time overheads. Based on the above two insights, we introduce a new scheme, named TF-Attack, for Transferable and Fast adversarial attacks on LLMs. TF-Attack employs an external LLM as a third-party overseer rather than the victim model to identify critical units within sentences. Moreover, TF-Attack introduces the concept of Importance Level, which allows for parallel substitutions of attacks. We conduct extensive experiments on 6 widely adopted benchmarks, evaluating the proposed method through both automatic and human metrics. Results show that our method consistently surpasses previous methods in transferability and delivers significant speed improvements, up to 20 times faster than earlier attack strategies.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes