CL CR LGMar 17, 2024

A Modified Word Saliency-Based Adversarial Attack on Text Classification Models

Hetvi Waghela, Sneha Rakshit, Jaydip Sen

arXiv:2403.11297v17.211 citationsh-index: 6

Originality Incremental advance

AI Analysis

This work addresses the vulnerability of text classification models to adversarial attacks, which is a critical security issue for applications relying on natural language processing, though it appears incremental as it builds upon existing word saliency concepts.

The paper tackles the problem of generating adversarial examples for text classification models by proposing a Modified Word Saliency-based Adversarial Attack (MWSAA), which refines traditional methods to strategically perturb input texts, resulting in higher attack success rates and better preservation of text coherence compared to existing techniques.

This paper introduces a novel adversarial attack method targeting text classification models, termed the Modified Word Saliency-based Adversarial At-tack (MWSAA). The technique builds upon the concept of word saliency to strategically perturb input texts, aiming to mislead classification models while preserving semantic coherence. By refining the traditional adversarial attack approach, MWSAA significantly enhances its efficacy in evading detection by classification systems. The methodology involves first identifying salient words in the input text through a saliency estimation process, which prioritizes words most influential to the model's decision-making process. Subsequently, these salient words are subjected to carefully crafted modifications, guided by semantic similarity metrics to ensure that the altered text remains coherent and retains its original meaning. Empirical evaluations conducted on diverse text classification datasets demonstrate the effectiveness of the proposed method in generating adversarial examples capable of successfully deceiving state-of-the-art classification models. Comparative analyses with existing adversarial attack techniques further indicate the superiority of the proposed approach in terms of both attack success rate and preservation of text coherence.

View on arXiv PDF

Similar