SEAIOct 11, 2022

Leveraging Artificial Intelligence on Binary Code Comprehension

arXiv:2210.05103v15 citationsh-index: 37
Originality Synthesis-oriented
AI Analysis

This addresses a challenge in software engineering for tasks like reverse engineering and malware analysis, but it appears incremental as it builds on existing ideas of using source code knowledge.

The paper tackles the problem of understanding binary code, which is complex due to limited semantic information, by proposing AI models that incorporate domain knowledge from source code to aid human comprehension, and it plans to investigate performance metrics through human studies.

Understanding binary code is an essential but complex software engineering task for reverse engineering, malware analysis, and compiler optimization. Unlike source code, binary code has limited semantic information, which makes it challenging for human comprehension. At the same time, compiling source to binary code, or transpiling among different programming languages (PLs) can provide a way to introduce external knowledge into binary comprehension. We propose to develop Artificial Intelligence (AI) models that aid human comprehension of binary code. Specifically, we propose to incorporate domain knowledge from large corpora of source code (e.g., variable names, comments) to build AI models that capture a generalizable representation of binary code. Lastly, we will investigate metrics to assess the performance of models that apply to binary code by using human studies of comprehension.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes