Annotation Entropy Predicts Per-Example Learning Dynamics in LoRA Fine-Tuning
This addresses the problem of understanding per-example learning dynamics in fine-tuning for NLP researchers, revealing a novel effect specific to LoRA methods.
The study found that LoRA fine-tuning leads to un-learning on examples with high annotator disagreement, as shown by increasing loss during training, a pattern absent in full fine-tuning and consistent across six models, with correlations between annotation entropy and loss area ranging from Spearman ρ=0.06 to 0.43.
We find that LoRA fine-tuning exhibits un-learning on contested examples: items with high annotator disagreement show increasing loss during training, a qualitatively distinct pattern largely absent under full fine-tuning and consistent across all six models tested (four encoder, two decoder-only). This discovery emerges from correlating annotation entropy, computed from ChaosNLI's 100 labels per example, with per-example area under the loss curve (AULC) on SNLI and MNLI. The correlation is positive in all 25 conditions tested (Spearman $ρ= 0.06$-$0.43$), with decoder-only models showing stronger correlations than encoders at matched LoRA rank. The effect survives partial-correlation controls and replicates across seeds and datasets. A preliminary noise-injection experiment is consistent with these findings.