CLLGDec 7, 2023

RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training

arXiv:2312.04032v1131 citationsh-index: 29EMNLP
Originality Incremental advance
AI Analysis

This addresses robustness issues like adversarial attacks and calibration for NLP practitioners, but it is incremental as it builds on existing fine-tuning approaches.

The paper tackles the problem of improving multi-perspective robustness in fine-tuned language models by proposing RoAST, a technique that uses adversarial perturbation with selective training, and demonstrates its effectiveness compared to state-of-the-art methods on six LMs.

Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives of robustness for LMs have been studied independently, but lacking a unified consideration in multiple perspectives. In this paper, we propose Robustifying LMs via Adversarial perturbation with Selective Training (RoAST), a simple yet effective fine-tuning technique to enhance the multi-perspective robustness of LMs in a unified way. RoAST effectively incorporates two important sources for the model robustness, robustness on the perturbed inputs and generalizable knowledge in pre-trained LMs. To be specific, RoAST introduces adversarial perturbation during fine-tuning while the model parameters are selectively updated upon their relative importance to minimize unnecessary deviation. Under a unified evaluation of fine-tuned LMs by incorporating four representative perspectives of model robustness, we demonstrate the effectiveness of RoAST compared to state-of-the-art fine-tuning methods on six different types of LMs, which indicates its usefulness in practice.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes