IRCLLGJan 28

One Word is Enough: Minimal Adversarial Perturbations for Neural Text Ranking

arXiv:2601.20283v1h-index: 10
Originality Incremental advance
AI Analysis

This work addresses the robustness problem for neural ranking systems in information retrieval, showing practical risks with incremental improvements in attack efficiency.

The paper tackles the vulnerability of neural ranking models to adversarial attacks by introducing a minimal, query-aware attack that modifies documents with a single word, achieving up to 91% success rate while averaging fewer than two token edits per document on TREC-DL benchmarks.

Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a Goldilocks zone in which mid-ranked documents are most vulnerable. These findings demonstrate practical risks and motivate future defenses for robust neural ranking.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes