On Robust Prefix-Tuning for Text Classification
This work addresses robustness issues in parameter-efficient finetuning for text classification, offering a solution that preserves efficiency and modularity, though it is incremental as it builds on existing prefix-tuning methods.
The paper tackles the lack of robustness in prefix-tuning against textual adversarial attacks by proposing a framework that uses layerwise activations from correctly-classified training data for additional prefix finetuning, resulting in substantial robustness improvements on three benchmarks against five attack types while maintaining comparable clean-text accuracy.
Recently, prefix-tuning has gained increasing attention as a parameter-efficient finetuning method for large-scale pretrained language models. The method keeps the pretrained models fixed and only updates the prefix token parameters for each downstream task. Despite being lightweight and modular, prefix-tuning still lacks robustness to textual adversarial attacks. However, most currently developed defense techniques necessitate auxiliary model update and storage, which inevitably hamper the modularity and low storage of prefix-tuning. In this work, we propose a robust prefix-tuning framework that preserves the efficiency and modularity of prefix-tuning. The core idea of our framework is leveraging the layerwise activations of the language model by correctly-classified training data as the standard for additional prefix finetuning. During the test phase, an extra batch-level prefix is tuned for each batch and added to the original prefix for robustness enhancement. Extensive experiments on three text classification benchmarks show that our framework substantially improves robustness over several strong baselines against five textual attacks of different types while maintaining comparable accuracy on clean texts. We also interpret our robust prefix-tuning framework from the optimal control perspective and pose several directions for future research.