Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators
For system security researchers and vendors, this work reveals critical security risks in Edge AI accelerators due to semantic gaps, with practical implications for billions of devices.
This paper studies Confused Deputy Attacks (CDAs) on Edge AI Accelerators (AIAs), finding that six out of seven tested AIAs are vulnerable, impacting over 128 SoCs and 100 million devices. A defense mechanism incurs ~15% runtime overhead.
AI Accelerator (AIA) are specialized hardware e.g., Tensor Processing Unit (TPU), that enable optimal and efficient execution of AI applications and on-device inference. The growing demand for AI applications has led to the widespread adoption of AIAs on Edge or embedded devices on Edge or embedded devices. Unlike applications, AIAs are not bound by Operating System (OS) restrictions and have limited visibility into Application Processor (AP) security mechanisms (e.g., kernel vs. application memory, process isolation). This semantic gap can lead to confused deputy vulnerabilities, i.e., AIA can be tricked by a malicious application to perform privileged operations on their behalf. In this paper, we conducted the first in-depth study of Confused Deputy Attacks (CDAs) using AIA. We design DeputyHunt, a Large Language Model (LLM) assisted framework to extract CDA relevant information for a given AIA through a combination of dynamic and static analysis. We used this information to explore the feasibility of CDA on seven different AIAs from popular vendors, i.e., Google, NVIDIA, Hailo, Texas Instruments, NXP, AWS, and Rockchip. Our analysis revealed that CDA is feasible on six out of the seven AIAs, impacting over 128 System On Chips (SOCs) and over 100 million devices. Our findings highlight critical security risks posed by AIA on system security. Our work has been acknowledged by the corresponding vendors and assigned the CVE-2025-66425. We propose an on-demand validation defense against CDA, and evaluation on the Gem5- salam simulator shows that it incurs minimal runtime overhead (i.e., ~15%).