CLFeb 10, 2025

SMAB: MAB based word Sensitivity Estimation Framework and its Applications in Adversarial Text Generation

Saurabh Kumar Pandey, Sachin Vashistha, Debrup Das, Somak Aditya, Monojit Choudhury

arXiv:2502.07101v116.311 citationsh-index: 13Has CodeNAACL

Originality Highly original

AI Analysis

This work addresses the problem of efficient sensitivity estimation for researchers and practitioners working on sequence classification tasks, particularly in the context of adversarial text generation, and provides an incremental improvement over existing methods.

The authors tackled the problem of efficiently calculating word sensitivity in sequence classification tasks and achieved a 15.58% improvement in attack success rate for adversarial example generation and a 12.00% improvement in adversarial paraphrase generation. Their framework, SMAB, provides a scalable approach for calculating word-level local and global sensitivities.

To understand the complexity of sequence classification tasks, Hahn et al. (2021) proposed sensitivity as the number of disjoint subsets of the input sequence that can each be individually changed to change the output. Though effective, calculating sensitivity at scale using this framework is costly because of exponential time complexity. Therefore, we introduce a Sensitivity-based Multi-Armed Bandit framework (SMAB), which provides a scalable approach for calculating word-level local (sentence-level) and global (aggregated) sensitivities concerning an underlying text classifier for any dataset. We establish the effectiveness of our approach through various applications. We perform a case study on CHECKLIST generated sentiment analysis dataset where we show that our algorithm indeed captures intuitively high and low-sensitive words. Through experiments on multiple tasks and languages, we show that sensitivity can serve as a proxy for accuracy in the absence of gold data. Lastly, we show that guiding perturbation prompts using sensitivity values in adversarial example generation improves attack success rate by 15.58%, whereas using sensitivity as an additional reward in adversarial paraphrase generation gives a 12.00% improvement over SOTA approaches. Warning: Contains potentially offensive content.

View on arXiv PDF Code

Similar