CLMay 23, 2022

Representation Projection Invariance Mitigates Representation Collapse

arXiv:2205.11603v3137 citationsh-index: 47
Originality Incremental advance
AI Analysis

This work addresses representation degradation in NLP fine-tuning, which can cause instability and poor generalization, offering a solution that is incremental but effective across multiple benchmarks.

The paper tackles representation collapse during fine-tuning of pre-trained language models by proposing REPINA, a regularization method that maintains representation information and reduces undesirable changes. The method outperformed 5 baselines on 10 out of 13 language understanding tasks, showing improved in-domain performance, few-shot effectiveness, and robustness to label perturbations.

Fine-tuning contextualized representations learned by pre-trained language models remains a prevalent practice in NLP. However, fine-tuning can lead to representation degradation (also known as representation collapse), which may result in instability, sub-optimal performance, and weak generalization. In this paper, we propose Representation Projection Invariance (REPINA), a novel regularization method to maintain the information content of representation and reduce representation collapse during fine-tuning by discouraging undesirable changes in the representations. We study the empirical behavior of the proposed regularization in comparison to 5 comparable baselines across 13 language understanding tasks (GLUE benchmark and six additional datasets). When evaluating in-domain performance, REPINA consistently outperforms other baselines on most tasks (10 out of 13). We also demonstrate its effectiveness in few-shot settings and robustness to label perturbation. As a by-product, we extend previous studies of representation collapse and propose several metrics to quantify it. Our empirical findings show that our approach is significantly more effective at mitigating representation collapse.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes