IRCLJan 8, 2025

Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval

arXiv:2501.04802v26 citationsh-index: 9ECIR
Originality Synthesis-oriented
AI Analysis

This work addresses efficiency and practicality issues in adversarial attacks for dense retrieval systems, but it is incremental as it builds on an existing method.

The paper tackled the computational inefficiency and unrealistic query access assumption of HotFlip for corpus poisoning attacks in dense retrieval, reducing adversarial generation time from 4 hours to 15 minutes per document and evaluating its performance in black-box and query-agnostic settings.

HotFlip is a topical gradient-based word substitution method for attacking language models. Recently, this method has been further applied to attack retrieval systems by generating malicious passages that are injected into a corpus, i.e., corpus poisoning. However, HotFlip is known to be computationally inefficient, with the majority of time being spent on gradient accumulation for each query-passage pair during the adversarial token generation phase, making it impossible to generate an adequate number of adversarial passages in a reasonable amount of time. Moreover, the attack method itself assumes access to a set of user queries, a strong assumption that does not correspond to how real-world adversarial attacks are usually performed. In this paper, we first significantly boost the efficiency of HotFlip, reducing the adversarial generation process from 4 hours per document to only 15 minutes, using the same hardware. We further contribute experiments and analysis on two additional tasks: (1) transfer-based black-box attacks, and (2) query-agnostic attacks. Whenever possible, we provide comparisons between the original method and our improved version. Our experiments demonstrate that HotFlip can effectively attack a variety of dense retrievers, with an observed trend that its attack performance diminishes against more advanced and recent methods. Interestingly, we observe that while HotFlip performs poorly in a black-box setting, indicating limited capacity for generalization, in query-agnostic scenarios its performance is correlated to the volume of injected adversarial passages.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes