Reza Soosahabi

67.9CRApr 1

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

Anubhab Sahu, Diptisha Samanta, Reza Soosahabi

System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive information such as API credentials, internal policies, and privileged workflow definitions, making system instruction leakage a critical security risk highlighted in the OWASP Top 10 for LLM Applications. Without incurring the overhead costs of reasoning models, many LLM applications rely on refusal-based instructions that block direct requests for system instructions, implicitly assuming that prohibited information can only be extracted through explicit queries. We introduce an automated evaluation framework that tests whether system instructions remain confidential when extraction requests are re-framed as encoding or structured output tasks. Across four common models and 46 verified system instructions, we observe high attack success rates (> 0.7) for structured serialization where models refuse direct extraction requests but disclose protected content in the requested serialization formats. We further demonstrate a mitigation strategy based on one-shot instruction reshaping using a Chain-of-Thought reasoning model, indicating that even subtle changes in wording and structure of system instructions can significantly reduce attack success rate without requiring model retraining.

CRAug 27, 2021

On Securing MAC Layer Broadcast Signals Against Covert Channel Exploitation in 5G, 6G & Beyond

Reza Soosahabi, Magdy Bayoumi

In this work, we propose a novel framework to identify and mitigate a recently disclosed covert channel scheme exploiting unprotected broadcast messages in cellular MAC layer protocols. Examples of covert channel are used in data exfiltration, remote command-and-control (CnC) and espionage. Responsibly disclosed to GSMA (CVD-2021-0045), the SPARROW covert channel scheme exploits the downlink power of LTE/5G base-stations that broadcast contention resolution identity (CRI) from any anonymous device according to the 3GPP standards. Thus, the SPARROW devices can covertly relay short messages across long-distance which can be potentially harmful to critical infrastructure. The SPARROW schemes can also complement the solutions for long-range M2M applications. This work investigates the security vs. performance trade-off in CRI-based contention resolution mechanisms. Then it offers a rigorously designed method to randomly obfuscate CRI broadcast in future 5G/6G standards. Compared to CRI length reduction, the proposed method achieves considerable protection against SPARROW exploitation with less impact on the random-access performance as shown in the numerical results.

Reza Soosahabi

2 Papers