IR CLAug 21, 2025

Adversarial Attacks against Neural Ranking Models via In-Context Learning

Amin Bigdeli, Negar Arabzadeh, Ebrahim Bagheri, Charles L. A. Clarke

arXiv:2508.15283v113.25 citationsh-index: 17Has CodeSIGIR-AP

Originality Highly original

AI Analysis

This addresses a security threat for neural retrieval systems by demonstrating a scalable attack method that works without gradient access.

The paper tackles the vulnerability of neural ranking models to adversarial manipulation by introducing Few-Shot Adversarial Prompting (FSAP), a black-box attack framework that uses in-context learning with LLMs to generate high-ranking adversarial documents; experiments on TREC 2020 and 2021 Health Misinformation Tracks show FSAP-generated documents consistently outrank credible ones across four neural ranking models.

While neural ranking models (NRMs) have shown high effectiveness, they remain susceptible to adversarial manipulation. In this work, we introduce Few-Shot Adversarial Prompting (FSAP), a novel black-box attack framework that leverages the in-context learning capabilities of Large Language Models (LLMs) to generate high-ranking adversarial documents. Unlike previous approaches that rely on token-level perturbations or manual rewriting of existing documents, FSAP formulates adversarial attacks entirely through few-shot prompting, requiring no gradient access or internal model instrumentation. By conditioning the LLM on a small support set of previously observed harmful examples, FSAP synthesizes grammatically fluent and topically coherent documents that subtly embed false or misleading information and rank competitively against authentic content. We instantiate FSAP in two modes: FSAP-IntraQ, which leverages harmful examples from the same query to enhance topic fidelity, and FSAP-InterQ, which enables broader generalization by transferring adversarial patterns across unrelated queries. Our experiments on the TREC 2020 and 2021 Health Misinformation Tracks, using four diverse neural ranking models, reveal that FSAP-generated documents consistently outrank credible, factually accurate documents. Furthermore, our analysis demonstrates that these adversarial outputs exhibit strong stance alignment and low detectability, posing a realistic and scalable threat to neural retrieval systems. FSAP also effectively generalizes across both proprietary and open-source LLMs.

View on arXiv PDF

Similar