CRAILGAug 25, 2025

Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models

arXiv:2508.17674v21 citationsh-index: 3Has Code
Originality Incremental advance
AI Analysis

This addresses a critical security gap for stakeholders using LLMs and AI agents, highlighting an urgent need for detection and policy responses, though it is incremental in building on existing attack vectors.

The paper tackles the problem of security threats to large language models (LLMs) and AI agents by introducing Advertisement Embedding Attacks (AEA), which stealthily inject promotional or malicious content into model outputs, compromising information integrity without degrading accuracy.

We introduce Advertisement Embedding Attacks (AEA), a new class of LLM security threats that stealthily inject promotional or malicious content into model outputs and AI agents. AEA operate through two low-cost vectors: (1) hijacking third-party service-distribution platforms to prepend adversarial prompts, and (2) publishing back-doored open-source checkpoints fine-tuned with attacker data. Unlike conventional attacks that degrade accuracy, AEA subvert information integrity, causing models to return covert ads, propaganda, or hate speech while appearing normal. We detail the attack pipeline, map five stakeholder victim groups, and present an initial prompt-based self-inspection defense that mitigates these injections without additional model retraining. Our findings reveal an urgent, under-addressed gap in LLM security and call for coordinated detection, auditing, and policy responses from the AI-safety community.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes