79.8CRMay 20
A Large Language Model Approach to Generating Bypass Rules for Malware Evasion in Analysis SandboxZhiyong Sui, Lamine Noureddine, Mst Eshita Khatun et al.
Sandbox evasion remains a critical challenge for automated malware analysis, as modern malware employs environment checks to detect analysis platforms and suppress malicious behavior. Existing approaches rely on manually crafted bypass rules that require deep reverse engineering of each evasion mechanism -an approach that cannot scale against rapidly evolving evasion techniques. In this paper, we leverage large language models (LLMs) to automatically generate YARA rules that bypass evasion checks in sandbox environments. We propose ABLE, which analyzes execution traces from malware terminated due to potentially evasive behavior and employs multiple reasoning strategies to generate targeted bypass rules. To address syntactic errors and improve the efficacy of the bypass rules in the LLM outputs, we introduce an auto-sanitization pipeline and feedback-driven iterative refinement. We evaluate ABLE on 334 real-world malware samples across four open-weight LLMs. ABLE achieves a 79% bypass success rate, with iterative refinement contributing 29.5% of successful cases. Compared to existing analysis platforms, ABLE identifies 47% more malware family classifications and exposes previously hidden behaviors.
CROct 23, 2025Code
REx86: A Local Large Language Model for Assisting in x86 Assembly Reverse EngineeringDarrin Lea, James Ghawaly, Golden Richard et al.
Reverse engineering (RE) of x86 binaries is indispensable for malware and firmware analysis, but remains slow due to stripped metadata and adversarial obfuscation. Large Language Models (LLMs) offer potential for improving RE efficiency through automated comprehension and commenting, but cloud-hosted, closed-weight models pose privacy and security risks and cannot be used in closed-network facilities. We evaluate parameter-efficient fine-tuned local LLMs for assisting with x86 RE tasks in these settings. Eight open-weight models across the CodeLlama, Qwen2.5-Coder, and CodeGemma series are fine-tuned on a custom curated dataset of 5,981 x86 assembly examples. We evaluate them quantitatively and identify the fine-tuned Qwen2.5-Coder-7B as the top performer, which we name REx86. REx86 reduces test-set cross-entropy loss by 64.2% and improves semantic cosine similarity against ground truth by 20.3\% over its base model. In a limited user case study (n=43), REx86 significantly enhanced line-level code understanding (p = 0.031) and increased the correct-solve rate from 31% to 53% (p = 0.189), though the latter did not reach statistical significance. Qualitative analysis shows more accurate, concise comments with fewer hallucinations. REx86 delivers state-of-the-art assistance in x86 RE among local, open-weight LLMs. Our findings demonstrate the value of domain-specific fine-tuning, and highlight the need for more commented disassembly data to further enhance LLM performance in RE. REx86, its dataset, and LoRA adapters are publicly available at https://github.com/dlea8/REx86 and https://zenodo.org/records/15420461.
CRJan 8, 2025
Exploring Large Language Models for Semantic Analysis and Categorization of Android MalwareBrandon J Walton, Mst Eshita Khatun, James M Ghawaly et al.
Malware analysis is a complex process of examining and evaluating malicious software's functionality, origin, and potential impact. This arduous process typically involves dissecting the software to understand its components, infection vector, propagation mechanism, and payload. Over the years, deep reverse engineering of malware has become increasingly tedious, mainly due to modern malicious codebases' fast evolution and sophistication. Essentially, analysts are tasked with identifying the elusive needle in the haystack within the complexities of zero-day malware, all while under tight time constraints. Thus, in this paper, we explore leveraging Large Language Models (LLMs) for semantic malware analysis to expedite the analysis of known and novel samples. Built on GPT-4o-mini model, \msp is designed to augment malware analysis for Android through a hierarchical-tiered summarization chain and strategic prompt engineering. Additionally, \msp performs malware categorization, distinguishing potential malware from benign applications, thereby saving time during the malware reverse engineering process. Despite not being fine-tuned for Android malware analysis, we demonstrate that through optimized and advanced prompt engineering \msp can achieve up to 77% classification accuracy while providing highly robust summaries at functional, class, and package levels. In addition, leveraging the backward tracing of the summaries from package to function levels allowed us to pinpoint the precise code snippets responsible for malicious behavior.
CYFeb 14, 2022
Intent-Aware Permission Architecture: A Model for Rethinking Informed Consent for Android AppsMd Rashedur Rahman, Elizabeth Miller, Moinul Hossain et al.
As data privacy continues to be a crucial human-right concern as recognized by the UN, regulatory agencies have demanded developers obtain user permission before accessing user-sensitive data. Mainly through the use of privacy policies statements, developers fulfill their legal requirements to keep users abreast of the requests for their data. In addition, platforms such as Android enforces explicit permission request using the permission model. Nonetheless, recent research has shown that service providers hardly make full disclosure when requesting data in these statements. Neither is the current permission model designed to provide adequate informed consent. Often users have no clear understanding of the reason and scope of usage of the data request. This paper proposes an unambiguous, informed consent process that provides developers with a standardized method for declaring Intent. Our proposed Intent-aware permission architecture extends the current Android permission model with a precise mechanism for full disclosure of purpose and scope limitation. The design of which is based on an ontology study of data requests purposes. The overarching objective of this model is to ensure end-users are adequately informed before making decisions on their data. Additionally, this model has the potential to improve trust between end-users and developers.