CLApr 10, 2025

Defense against Prompt Injection Attacks via Mixture of Encodings

arXiv:2504.07467v119 citationsh-index: 6NAACL
Originality Incremental advance
AI Analysis

This addresses security vulnerabilities in LLMs for users relying on external information, but it is incremental as it builds on existing character encoding-based defenses.

The paper tackles the problem of prompt injection attacks on Large Language Models by proposing a mixture of encodings defense, which achieves one of the lowest attack success rates while maintaining high performance across NLP tasks.

Large Language Models (LLMs) have emerged as a dominant approach for a wide range of NLP tasks, with their access to external information further enhancing their capabilities. However, this introduces new vulnerabilities, known as prompt injection attacks, where external content embeds malicious instructions that manipulate the LLM's output. Recently, the Base64 defense has been recognized as one of the most effective methods for reducing success rate of prompt injection attacks. Despite its efficacy, this method can degrade LLM performance on certain NLP tasks. To address this challenge, we propose a novel defense mechanism: mixture of encodings, which utilizes multiple character encodings, including Base64. Extensive experimental results show that our method achieves one of the lowest attack success rates under prompt injection attacks, while maintaining high performance across all NLP tasks, outperforming existing character encoding-based defense methods. This underscores the effectiveness of our mixture of encodings strategy for both safety and task performance metrics.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes