CLAIOct 30, 2024

Teaching a Language Model to Distinguish Between Similar Details using a Small Adversarial Training Set

arXiv:2410.23118v1
Originality Incremental advance
AI Analysis

This addresses robustness issues in natural language inference for AI systems, but it is incremental as it builds on existing adversarial training methods.

The paper tackled the problem of language models performing poorly on adversarial examples by fine-tuning a model on a small adversarial training set, resulting in a 13% accuracy increase on the adversarial test set and improvement from 91.2% to 92.9% on similar contradictions in the SNLI test set.

Language models can achieve high accuracy on natural language tasks such as NLI, but performance suffers on manually created adversarial examples. We investigate the performance of a language model trained on the Stanford Natural Language Inference (SNLI) corpus on a manually created adversarial test set. We then improve the model's performance by fine tuning the model on a small, manually created adversarial training set, designed to help the language model to learn to differentiate between similar words and phrases in the data. We show an increase in accuracy on the adversarial test set (+ 13%) while still maintaining good performance on the original NLI task. We also show an increase in accuracy from 91.2% to 92.9% on the most similar contradictions in the SNLI test set (as judged by cosine similarity).

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes