CR LGNov 18, 2024

GNN-Based Code Annotation Logic for Establishing Security Boundaries in C Code

Varun Gadey, Raphael Goetz, Christoph Sendner, Sampo Sovio, Alexandra Dmitrienko

arXiv:2411.11567v22.31 citationsh-index: 25

Originality Incremental advance

AI Analysis

This addresses the challenge for software developers of efficiently securing applications with TEEs without requiring specialized expertise, though it appears incremental as it builds on existing graph-based and TEE approaches.

The paper tackles the problem of automatically identifying security-sensitive code components for isolation in Trusted Execution Environments (TEEs), proposing Code Annotation Logic (CAL) which achieves a recall of 86.05%, F1 score of 81.56%, and identification rate of 91.59% for sensitive functions.

Securing sensitive operations in today's interconnected software landscape is crucial yet challenging. Modern platforms rely on Trusted Execution Environments (TEEs), such as Intel SGX and ARM TrustZone, to isolate security sensitive code from the main system, reducing the Trusted Computing Base (TCB) and providing stronger assurances. However, identifying which code should reside in TEEs is complex and requires specialized expertise, which is not supported by current automated tools. Existing solutions often migrate entire applications to TEEs, leading to suboptimal use and an increased TCB. To address this gap, we propose Code Annotation Logic (CAL), a pioneering tool that automatically identifies security sensitive components for TEE isolation. CAL analyzes codebases, leveraging a graph-based approach with novel feature construction and employing a custom graph neural network model to accurately determine which parts of the code should be isolated. CAL effectively optimizes TCB, reducing the burden of manual analysis and enhancing overall security. Our contributions include the definition of security sensitive code, the construction and labeling of a comprehensive dataset of source files, a feature rich graph based data preparation pipeline, and the CAL model for TEE integration. Evaluation results demonstrate CAL's efficacy in identifying sensitive code with a recall of 86.05%, an F1 score of 81.56%, and an identification rate of 91.59% for security sensitive functions. By enabling efficient code isolation, CAL advances the secure development of applications using TEEs, offering a practical solution for developers to reduce attack vectors.

View on arXiv PDF

Similar