CLAIDec 29, 2025

Multilingual Hidden Prompt Injection Attacks on LLM-Based Academic Reviewing

arXiv:2512.23684v11 citationsh-index: 4
Originality Highly original
AI Analysis

This work addresses a security risk for academic reviewing systems using LLMs, revealing language-specific vulnerabilities that could impact fairness and integrity in high-stakes workflows.

The study investigated the vulnerability of LLM-based academic peer review to hidden prompt injection attacks by embedding adversarial instructions in 500 real ICML papers across four languages, finding substantial changes in review scores and decisions for English, Japanese, and Chinese injections, but minimal effect for Arabic.

Large language models (LLMs) are increasingly considered for use in high-impact workflows, including academic peer review. However, LLMs are vulnerable to document-level hidden prompt injection attacks. In this work, we construct a dataset of approximately 500 real academic papers accepted to ICML and evaluate the effect of embedding hidden adversarial prompts within these documents. Each paper is injected with semantically equivalent instructions in four different languages and reviewed using an LLM. We find that prompt injection induces substantial changes in review scores and accept/reject decisions for English, Japanese, and Chinese injections, while Arabic injections produce little to no effect. These results highlight the susceptibility of LLM-based reviewing systems to document-level prompt injection and reveal notable differences in vulnerability across languages.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes