52.3CRMay 21Code
The CTI Echo Chamber: Fragmentation, Overlap, and Vendor Specificity in Twenty Years of Cyber Threat ReportingManuel Suarez-Roman, Francesco Marchiori, Mauro Conti et al.
Despite the high volume of open-source Cyber Threat Intelligence (CTI), our understanding of long-term threat actor-victim dynamics remains fragmented due to inconsistent reporting standards and the lack of structured datasets containing comprehensive analytic information. In this paper, we present a large-scale automated analysis of open-source CTI reports spanning two decades. We develop a high-precision, LLM-based pipeline to ingest and structure 16,096 reports, extracting key entities such as attributed threat actors, motivations, victims, reporting vendors, and technical indicators (IoCs and TTPs). Our analysis quantifies the evolution of CTI information density and specialization, characterizing patterns that relate specific threat actors to motivations and victim profiles. Furthermore, we perform a meta-analysis of the CTI industry itself. We identify a fragmented ecosystem of distinct silos where vendors demonstrate significant geographic and sectoral reporting biases. Our marginal coverage analysis reveals that intelligence overlap between vendors is typically low: while a few core providers may offer broad situational awareness, additional sources yield diminishing returns. Overall, our findings characterize the structural biases inherent in the CTI ecosystem, enabling practitioners and researchers to better evaluate the completeness of their intelligence sources.
CRMay 7, 2019Code
An Analysis of Pre-installed Android SoftwareJulien Gamba, Mohammed Rashed, Abbas Razaghpanah et al.
The open-source nature of the Android OS makes it possible for manufacturers to ship custom versions of the OS along with a set of pre-installed apps, often for product differentiation. Some device vendors have recently come under scrutiny for potentially invasive private data collection practices and other potentially harmful or unwanted behavior of the pre-installed apps on their devices. Yet, the landscape of pre-installed software in Android has largely remained unexplored, particularly in terms of the security and privacy implications of such customizations. In this paper, we present the first large-scale study of pre-installed software on Android devices from more than 200 vendors. Our work relies on a large dataset of real-world Android firmware acquired worldwide using crowd-sourcing methods. This allows us to answer questions related to the stakeholders involved in the supply chain, from device manufacturers and mobile network operators to third-party organizations like advertising and tracking services, and social network platforms. Our study allows us to also uncover relationships between these actors, which seem to revolve primarily around advertising and data-driven services. Overall, the supply chain around Android's open source model lacks transparency and has facilitated potentially harmful behaviors and backdoored access to sensitive data and services without user consent or awareness. We conclude the paper with recommendations to improve transparency, attribution, and accountability in the Android ecosystem.
51.3CRMay 7
Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and EncryptionJaime Morales, Sergio Pastrana, Juan Tapiador
Software obfuscation and encryption present persistent challenges for program comprehension and security analysis, particularly when adversaries conceal Indicators of Compromise (IoCs) such as IP addresses within source code. While Large Language Models (LLMs) have recently demonstrated remarkable progress in code reasoning and transformation, their resilience against adversarial concealment techniques remains largely uncharted. This paper introduces a systematic benchmark for secret detection under adversarial code transformations, designed to evaluate the capacity of LLMs to recover IoCs embedded in obfuscated and encrypted JavaScript programs. We construct a dataset of 336 programs, progressively transformed through 12 levels of obfuscation and cryptographic concealment (including XOR and AES-256), to emulate realistic threat scenarios. An automated evaluation framework standardizes LLM queries and responses, enabling reproducible, large-scale testing across diverse models. Our results reveal a dichotomy: while LLMs exhibit high success against lightweight transformations such as variable renaming and Base64 encoding, encryption-based concealment severely degrades detection performance. These findings establish encryption as a critical frontier for LLM-driven code analysis and highlight both current limitations and avenues for advancing automated threat intelligence.
32.3CRMay 5
The Infinite Mutation Engine? Measuring Polymorphism in LLM-Generated Offensive CodeGabriel Hortea, Juan Tapiador
Malware authors have traditionally relied on polymorphic techniques to produce variants in the same malware family, complicating signature-based detection. Integrating generative AI into offensive toolchains enables attackers to synthesize structurally diverse payloads with identical behavior, raising the question of how much polymorphism LLMs provide. Recent work has assumed that LLMs can produce sufficiently polymorphic payloads, leaving unquantified the variation that emerges when an attacker repeatedly builds the same payload, or explicitly instructs the model to avoid prior implementations. In this work, we measure the polymorphic capacity of a commercial model (Claude Opus 4.6) as an automated malware generator. We build a dual-agent, four-stage pipeline that generates, tests, and refines a data-exfiltration payload comprising file traversal, encryption, exfiltration, and integration. We produce payloads in two settings: using prompts that specify only functional requirements, and using prompts that inject a structured history of prior outcomes to force divergence. We measure pairwise distances along structural (AST) and semantic (embedding) axes, finding that when polymorphism is not explicitly required, structural distances are high while semantic distances remain low; i.e., implementations diverge widely without changing high-level behavior. Explicit prompting substantially amplifies this structural diversity while preserving correctness, at the cost of roughly 5 times more tokens but only a small increase in LLM calls (from $4.2$ to $4.5$ per payload, with effective API costs of \$0.41 and \$0.73). These results show that a single commercial LLM can cheaply generate large populations of behaviorally equivalent yet structurally diverse payloads, facilitating the evasion of signature-based detection rules and similarity-based clustering.
CRSep 7, 2021
POW-HOW: An enduring timing side-channel to evade online malware sandboxesAntonio Nappa, Panagiotis Papadopoulos, Matteo Varvello et al.
Online malware scanners are one of the best weapons in the arsenal of cybersecurity companies and researchers. A fundamental part of such systems is the sandbox that provides an instrumented and isolated environment (virtualized or emulated) for any user to upload and run unknown artifacts and identify potentially malicious behaviors. The provided API and the wealth of information inthe reports produced by these services have also helped attackers test the efficacy of numerous techniques to make malware hard to detect.The most common technique used by malware for evading the analysis system is to monitor the execution environment, detect the presence of any debugging artifacts, and hide its malicious behavior if needed. This is usually achieved by looking for signals suggesting that the execution environment does not belong to a the native machine, such as specific memory patterns or behavioral traits of certain CPU instructions. In this paper, we show how an attacker can evade detection on such online services by incorporating a Proof-of-Work (PoW) algorithm into a malware sample. Specifically, we leverage the asymptotic behavior of the computational cost of PoW algorithms when they run on some classes of hardware platforms to effectively detect a non bare-metal environment of the malware sandbox analyzer. To prove the validity of this intuition, we design and implement the POW-HOW framework, a tool to automatically implement sandbox detection strategies and embed a test evasion program into an arbitrary malware sample. Our empirical evaluation shows that the proposed evasion technique is durable, hard to fingerprint, and reduces existing malware detection rate by a factor of 10. Moreover, we show how bare-metal environments cannot scale with actual malware submissions rates for consumer services.
CRAug 6, 2019
After You, Please: Browser Extensions Order Attacks and CountermeasuresPablo Picazo-Sanchez, Juan Tapiador, Gerardo Schneider
Browser extensions are small applications executed in the browser context that provide additional capabilities and enrich the user experience while surfing the web. The acceptance of extensions in current browsers is unquestionable. For instance, Chrome's official extension repository has more than 63,000 extensions, with some of them having more than 10M users. When installed, extensions are pushed into an internal queue within the browser. The order in which each extension executes depends on a number of factors, including their relative installation times. In this paper, we demonstrate how this order can be exploited by an unprivileged malicious extension (i.e., one with no more permissions than those already assigned when accessing web content) to get access to any private information that other extensions have previously introduced. Our solution does not require modifying the core browser engine as it is implemented as another browser extension. We prove that our approach effectively protects the user against usual attackers (i.e., any other installed extension) as well as against strong attackers having access to the effects of all installed extensions (i.e., knowing who did what). We also prove soundness and robustness of our approach under reasonable assumptions.
CRNov 16, 2018
The MalSource Dataset: Quantifying Complexity and Code Reuse in Malware DevelopmentAlejandro Calleja, Juan Tapiador, Juan Caballero
During the last decades, the problem of malicious and unwanted software (malware) has surged in numbers and sophistication. Malware plays a key role in most of today's cyber attacks and has consolidated as a commodity in the underground economy. In this work, we analyze the evolution of malware from 1975 to date from a software engineering perspective. We analyze the source code of 456 samples from 428 unique families and obtain measures of their size, code quality, and estimates of the development costs (effort, time, and number of people). Our results suggest an exponential increment of nearly one order of magnitude per decade in aspects such as size and estimated effort, with code quality metrics similar to those of benign software.We also study the extent to which code reuse is present in our dataset. We detect a significant number of code clones across malware families and report which features and functionalities are more commonly shared. Overall, our results support claims about the increasing complexity of malware and its production progressively becoming an industry.
NISep 26, 2018
Beyond Google Play: A Large-Scale Comparative Study of Chinese Android App MarketsHaoyu Wang, Zhe Liu, Jingyue Liang et al.
China is one of the largest Android markets in the world. As Chinese users cannot access Google Play to buy and install Android apps, a number of independent app stores have emerged and compete in the Chinese app market. Some of the Chinese app stores are pre-installed vendor-specific app markets (e.g., Huawei, Xiaomi and OPPO), whereas others are maintained by large tech companies (e.g., Baidu, Qihoo 360 and Tencent). The nature of these app stores and the content available through them vary greatly, including their trustworthiness and security guarantees. As of today, the research community has not studied the Chinese Android ecosystem in depth. To fill this gap, we present the first large-scale comparative study that covers more than 6 million Android apps downloaded from 16 Chinese app markets and Google Play. We focus our study on catalog similarity across app stores, their features, publishing dynamics, and the prevalence of various forms of misbehavior (including the presence of fake, cloned and malicious apps). Our findings also suggest heterogeneous developer behavior across app stores, in terms of code maintenance, use of third-party services, and so forth. Overall, Chinese app markets perform substantially worse when taking active measures to protect mobile users and legit developers from deceptive and abusive actors, showing a significantly higher prevalence of malware, fake, and cloned apps than Google Play.