CRAIAug 16, 2025

Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions

arXiv:2508.13214v13 citationsh-index: 7
Originality Incremental advance
AI Analysis

This work addresses a critical security issue for LLM-as-a-judge applications in domains like education and peer review, though it is incremental in exploring a specific attack setting.

The study tackled the problem of LLM vulnerability to hidden prompt injection attacks in simple arithmetic questions, revealing that LLMs are easily misled even in trivial scenarios, highlighting serious robustness risks.

Large Language Models (LLMs) have recently demonstrated strong emergent abilities in complex reasoning and zero-shot generalization, showing unprecedented potential for LLM-as-a-judge applications in education, peer review, and data quality evaluation. However, their robustness under prompt injection attacks, where malicious instructions are embedded into the content to manipulate outputs, remains a significant concern. In this work, we explore a frustratingly simple yet effective attack setting to test whether LLMs can be easily misled. Specifically, we evaluate LLMs on basic arithmetic questions (e.g., "What is 3 + 2?") presented as either multiple-choice or true-false judgment problems within PDF files, where hidden prompts are injected into the file. Our results reveal that LLMs are indeed vulnerable to such hidden prompt injection attacks, even in these trivial scenarios, highlighting serious robustness risks for LLM-as-a-judge applications.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes