CLLGJun 12, 2024

Adversarial Evasion Attack Efficiency against Large Language Models

arXiv:2406.08050v15 citations
AI Analysis

It addresses vulnerabilities in LLMs for text classification, which is important for developers but is incremental as it builds on existing adversarial attack research.

This paper analyzed the effectiveness, efficiency, and practicality of three adversarial attacks on five large language models for sentiment classification, finding that word-level attacks were more effective while character-level attacks were more practical with fewer perturbations and queries.

Large Language Models (LLMs) are valuable for text classification, but their vulnerabilities must not be disregarded. They lack robustness against adversarial examples, so it is pertinent to understand the impacts of different types of perturbations, and assess if those attacks could be replicated by common users with a small amount of perturbations and a small number of queries to a deployed LLM. This work presents an analysis of the effectiveness, efficiency, and practicality of three different types of adversarial attacks against five different LLMs in a sentiment classification task. The obtained results demonstrated the very distinct impacts of the word-level and character-level attacks. The word attacks were more effective, but the character and more constrained attacks were more practical and required a reduced number of perturbations and queries. These differences need to be considered during the development of adversarial defense strategies to train more robust LLMs for intelligent text classification applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes