CLCVOct 14, 2022

Pretrained Transformers Do not Always Improve Robustness

arXiv:2210.07663v12 citationsh-index: 30
Originality Synthesis-oriented
AI Analysis

This work addresses the robustness of NLP models for real-world applications where data can be noisy, revealing a limitation in PT that is incremental to existing knowledge.

The study tackled the problem of whether Pretrained Transformers (PT) improve robustness to noisy data compared to traditional models, finding empirical evidence that PT provide less robust representation on exposure to noisy data, with adversarial filtering failing to enhance robustness as it is fooled by the noise.

Pretrained Transformers (PT) have been shown to improve Out of Distribution (OOD) robustness than traditional models such as Bag of Words (BOW), LSTMs, Convolutional Neural Networks (CNN) powered by Word2Vec and Glove embeddings. How does the robustness comparison hold in a real world setting where some part of the dataset can be noisy? Do PT also provide more robust representation than traditional models on exposure to noisy data? We perform a comparative study on 10 models and find an empirical evidence that PT provide less robust representation than traditional models on exposure to noisy data. We investigate further and augment PT with an adversarial filtering (AF) mechanism that has been shown to improve OOD generalization. However, increase in generalization does not necessarily increase robustness, as we find that noisy data fools the AF method powered by PT.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes