Carter Yagemann

h-index7

4papers

200citations

4 Papers

5.9CRApr 2

EXHIB: A Benchmark for Realistic and Diverse Evaluation of Function Similarity in the Wild

Yiming Fan, Jun Yeon Won, Ding Zhu et al.

Binary Function Similarity Detection (BFSD) is a core problem in software security, supporting tasks such as vulnerability analysis, malware classification, and patch provenance. In the past few decades, numerous models and tools have been developed for this application; however, due to the lack of a comprehensive universal benchmark in this field, researchers have struggled to compare different models effectively. Existing datasets are limited in scope, often focusing on a narrow set of transformations or types of binaries, and fail to reflect the full diversity of real-world applications. We introduce EXHIB, a benchmark comprising five realistic datasets collected from the wild, each highlighting a distinct aspect of the BFSD problem space. We evaluate 9 representative models spanning multiple BFSD paradigms on EXHIB and observe performance degradations of up to 30% on firmware and semantic datasets compared to standard settings, revealing substantial generalization gaps. Our results show that robustness to low- and mid-level binary variations does not generalize to high-level semantic differences, underscoring a critical blind spot in current BFSD evaluation practices.

2.7CRJun 24

Beyond Takedown: Measuring Malicious Go Module Persistence in the Wild

Minjae Bae, Carter Yagemann

We measure an automation-based supply chain campaign in the Go ecosystem. The attackers repackage legitimate Go modules under attacker-controlled owners, and embed them with obfuscated code for an import-triggered downloader. Our results come from two complementary analyses: a) a manual search on GitHub across 2,113 repositories and b) a large-scale scan of 12.3M index entries using a deobfuscating AST scanner (GOAST) that we implemented. As a result, we identified 2,289 malicious versions of legitimate Go modules. We demonstrate that purely GitHub-centric searches fail to identify the full extent of the compromise and are only effective for as long as the affected code is present on the platform. Moreover, our proxy-based measurements of the takedown-remediation gap reveal that among artifacts later found to be GitHub-unobservable (i.e., removed or suspended), at least 99.4% remained retrievable via Go proxy. Following our disclosure, GitHub has removed 684 malicious repositories and the Google Go team has remediated 1,377 module versions.

4.1CRJun 23

Burnyard: Future of Malware Analysis

Rama Ramana Sharma Parnandi, Carter Yagemann

Malware analysis is a critical aspect of modern cybersecurity. The prevailing industry practice, sandboxing, involves executing suspicious binaries within isolated virtual machines in large-scale data centers. However, this approach can unintentionally expose samples to public platforms such as VirusTotal and MalwareBazaar, and it is both resource-intensive and time-consuming. Burnyard addresses these limitations through a lightweight binary emulation platform that captures observable runtime behavior and records it as structured CSV event traces.

4.9CRApr 30, 2019

To believe or not to believe: Validating explanation fidelity for dynamic malware analysis

Li Chen, Carter Yagemann, Evan Downing

Converting malware into images followed by vision-based deep learning algorithms has shown superior threat detection efficacy compared with classical machine learning algorithms. When malware are visualized as images, visual-based interpretation schemes can also be applied to extract insights of why individual samples are classified as malicious. In this work, via two case studies of dynamic malware classification, we extend the local interpretable model-agnostic explanation algorithm to explain image-based dynamic malware classification and examine its interpretation fidelity. For both case studies, we first train deep learning models via transfer learning on malware images, demonstrate high classification effectiveness, apply an explanation method on the images, and correlate the results back to the samples to validate whether the algorithmic insights are consistent with security domain expertise. In our first case study, the interpretation framework identifies indirect calls that uniquely characterize the underlying exploit behavior of a malware family. In our second case study, the interpretation framework extracts insightful information such as cryptography-related APIs when applied on images created from API existence, but generate ambiguous interpretation on images created from API sequences and frequencies. Our findings indicate that current image-based interpretation techniques are promising for explaining vision-based malware classification. We continue to develop image-based interpretation schemes specifically for security applications.