CL AIMay 2

Where Do Prompt Perturbations Break Generation? A Segment-Level View of Robustness in LoRA-Tuned Language Models

Zhuoyun Li, Boxuan Wang, Jinwei Hu, Zhenglin Huang, Qisong He, Xinmiao Huang, Guangliang Cheng, Xiaowei Huang, Yi Dong

arXiv:2605.0160586.21 citations

Predicted impact top 47% in CL · last 90 daysOriginality Incremental advance

AI Analysis

For practitioners fine-tuning LLMs with LoRA, this work addresses the failure mode where whole-sequence consistency hides critical entity-level drifts, offering a more granular robustness method.

The paper introduces S^2R^2, a segment-level robustness framework for LoRA-tuned language models that penalizes the largest meaning drifts between clean and perturbed generations, achieving improved robustness under typographical noise, deletion, synonym replacement, and paraphrasing while maintaining competitive clean performance and stronger cross-dataset transfer than consistency-based baselines.

Large language models are sensitive to minor prompt perturbations, yet existing robustness methods usually enforce consistency at the whole-sequence level. This holistic view can hide an important failure mode: a perturbed response may remain globally similar to the clean one while drifting on a critical entity, relation, or conclusion. We introduce S$^2$R$^2$, a segment-level framework for robust LoRA fine-tuning. S$^2$R$^2$ decomposes clean and perturbed generations into semantic segments, aligns them with an optimal-transport objective, and penalises the segments with the largest meaning drift. To connect this output-side objective with model adaptation, we add an adapter-stability regulariser motivated by segment-level attention reallocation, using LoRA norm control as a tractable proxy for limiting perturbation-amplified evidence shifts. A PAC-Bayesian complexity view further explains why controlling adapter growth may support transfer beyond observed perturbations. Experiments on summarisation benchmarks show that S$^2$R$^2$ improves robustness under typographical noise, deletion, synonym replacement, and paraphrasing, while maintaining competitive clean performance and stronger cross-dataset transfer than consistency-based baselines.

View on arXiv PDF

Similar