Safe and Practical GPU Acceleration in TrustZone
This work addresses secure GPU acceleration for mobile devices in TrustZone, representing an incremental improvement with specific optimizations.
The paper tackles the problem of securely accelerating GPU computation in TrustZone TEE by introducing CODY, a collaborative architecture between a mobile device and cloud service that records and replays CPU/GPU interactions, achieving up to 95% faster recording and 25% lower replay delays compared to insecure native execution.
We present a holistic design for GPU-accelerated computation in TrustZone TEE. Without pulling the complex GPU software stack into the TEE, we follow a simple approach: record the CPU/GPU interactions ahead of time, and replay the interactions in the TEE at run time. This paper addresses the approach's key missing piece -- the recording environment, which needs both strong security and access to diverse mobile GPUs. To this end, we present a novel architecture called CODY, in which a mobile device (which possesses the GPU hardware) and a trustworthy cloud service (which runs the GPU software) exercise the GPU hardware/software in a collaborative, distributed fashion. To overcome numerous network round trips and long delays, CODY contributes optimizations specific to mobile GPUs: register access deferral, speculation, and metastate-only synchronization. With these optimizations, recording a compute workload takes only tens of seconds, which is up to 95% less than a naive approach; replay incurs 25% lower delays compared to insecure, native execution.