CRApr 14, 2020

Topology-Aware Hashing for Effective Control Flow Graph Similarity Analysis

arXiv:2004.06563v1
Originality Incremental advance
AI Analysis

This work addresses a domain-specific problem for security analysts, offering incremental improvements in CFG similarity analysis.

The paper tackles the problem of limited efficiency, accuracy, and usability in Control Flow Graph similarity analysis for security tasks by proposing a novel fuzzy hashing scheme called topology-aware hashing (TAH), which outperforms existing techniques in effectiveness and efficiency as demonstrated in malware clustering.

Control Flow Graph (CFG) similarity analysis is an essential technique for a variety of security analysis tasks, including malware detection and malware clustering. Even though various algorithms have been developed, existing CFG similarity analysis methods still suffer from limited efficiency, accuracy, and usability. In this paper, we propose a novel fuzzy hashing scheme called topology-aware hashing (TAH) for effective and efficient CFG similarity analysis. Given the CFGs constructed from program binaries, we extract blended n-gram graphical features of the CFGs, encode the graphical features into numeric vectors (called graph signatures), and then measure the graph similarity by comparing the graph signatures. We further employ a fuzzy hashing technique to convert the numeric graph signatures into smaller fixed-size fuzzy hash signatures for efficient similarity calculation. Our comprehensive evaluation demonstrates that TAH is more effective and efficient compared to existing CFG comparison techniques. To demonstrate the applicability of TAH to real-world security analysis tasks, we develop a binary similarity analysis tool based on TAH, and show that it outperforms existing similarity analysis tools while conducting malware clustering.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes