65.5SEMar 19
Measuring and Exploiting Confirmation Bias in LLM-Assisted Security Code ReviewDimitris Mitropoulos, Nikolaos Alexopoulos, Georgios Alexopoulos et al.
Security code reviews increasingly rely on systems integrating Large Language Models (LLMs), ranging from interactive assistants to autonomous agents in CI/CD pipelines. We study whether confirmation bias (i.e., the tendency to favor interpretations that align with prior expectations) affects LLM-based vulnerability detection, and whether this failure mode can be exploited in software supply-chain attacks. We conduct two complementary studies. Study 1 quantifies confirmation bias through controlled experiments on 250 CVE vulnerability/patch pairs evaluated across four state-of-the-art models under five framing conditions for the review prompt. Framing a change as bug-free reduces vulnerability detection rates by 16-93%, with strongly asymmetric effects: false negatives increase sharply while false positive rates change little. Bias effects vary by vulnerability type, with injection flaws being more susceptible to them than memory corruption bugs. Study 2 evaluates exploitability in practice mimicking adversarial pull requests that reintroduce known vulnerabilities while framed as security improvements or urgent functionality fixes via their pull request metadata. Adversarial framing succeeds in 35% of cases against GitHub Copilot (interactive assistant) under one-shot attacks and in 88% of cases against Claude Code (autonomous agent) in real project configurations where adversaries can iteratively refine their framing to increase attack success. Debiasing via metadata redaction and explicit instructions restores detection in all interactive cases and 94% of autonomous cases. Our results show that confirmation bias poses a weakness in LLM-based code review, with implications on how AI-assisted development tools are deployed.
16.5CRMar 19
Cross-Ecosystem Vulnerability Analysis for Python ApplicationsGeorgios Alexopoulos, Nikolaos Alexopoulos, Thodoris Sotiropoulos et al.
Python applications depend on native libraries that may be vendored within package distributions or installed on the host system. When vulnerabilities are discovered in these libraries, determining which Python packages are affected requires cross-ecosystem analysis spanning Python dependency graphs and OS package versions. Current vulnerability scanners produce false negatives by missing vendored vulnerabilities and false positives by ignoring security patches backported by OS distributions. We present a provenance-aware vulnerability analysis approach that resolves vendored libraries to specific OS package versions or upstream releases. Our approach queries vendored libraries against a database of historical OS package artifacts using content-based hashing, and applies library-specific dynamic analyses to extract version information from binaries built from upstream source. We then construct cross-ecosystem call graphs by stitching together Python and binary call graphs across dependency boundaries, enabling reachability analysis of vulnerable functions. Evaluating on 100,000 Python packages and 10 known CVEs associated with third-party native dependencies, we identify 39 directly vulnerable packages (47M+ monthly downloads) and 312 indirectly vulnerable client packages affected through dependency chains. Our analysis achieves up to 97% false positive reduction compared to upstream version matching.
21.3CRApr 30
WOOTdroid: Whole-system Online On-device Tracing for AndroidSimon Althaus, Nikolaos Alexopoulos, Max Mühlhäuser et al.
System auditing on Android faces two problems. First, existing syscall tracers lose events under load, silently overwriting entries faster than a user space reader can drain them. Second, security-relevant application behavior is mediated through Binder, Android's kernel IPC mechanism, and is therefore hidden from the syscall layer. The Binder parcels that the kernel does see carry no method names or typed arguments, a disconnect between low-level events and high-level behavior known as the semantic gap. Existing approaches address the semantic gap either by modifying the Android platform, making them difficult to adjust to OS updates, or by instrumenting the traced application in user space, which sophisticated adversaries can evade by bypassing the instrumented framework APIs. We present WOOTdroid, a design and prototype for on-device tracing on stock Android that addresses both problems without OS modification or application instrumentation. WDSys, an eBPF port of eAudit-style syscall auditing, runs on current Android with at most 3.6% Geekbench overhead and traces 33% more syscalls than ftrace. WDBind captures Binder parcels in the kernel and decodes them out-of-process against a framework signature table extracted via Java reflection. We demonstrate WOOTdroid on Pixel 9 devices running Android 16 with an end-to-end case study reconstructing ten security-relevant Binder transactions.
4.5CRMar 31
An Empirical Comparison of Security and Privacy Characteristics of Android Messaging AppsIoannis Karyotakis, Foivos Timotheos Proestakis, Evangelos Talos et al.
Mobile messaging apps are a fundamental communication infrastructure, used by billions of people every day to share information, including sensitive data. Security and Privacy are thus critical concerns for such applications. Although the cryptographic protocols prevalent in messaging apps are generally well studied, other relevant implementation characteristics of such apps, such as their software architecture, permission use, and network-related runtime behavior, have not received enough attention. In this paper, we present a methodology for comparing implementation characteristics of messaging applications by employing static and dynamic analysis under reproducible scenarios to identify discrepancies with potential security and privacy implications. We apply this methodology to study the Android clients of the Meta Messenger, Signal, and Telegram apps. Our main findings reveal discrepancies in application complexity, attack surface, and network behavior. Statically, Messenger presents the largest attack surface and the highest number of static analysis warnings, while Telegram requests the most dangerous permissions. In contrast, Signal consistently demonstrates a minimalist design with the fewest dependencies and dangerous permissions. Dynamically, these differences are reflected in network activity; Messenger is by far the most active, exhibiting persistent background communication, whereas Signal is the least active. Furthermore, our analysis shows that all applications properly adhere to the Android permission model, with no evidence of unauthorized data access.
CRMay 9, 2019
TRIDEnT: Building Decentralized Incentives for Collaborative SecurityNikolaos Alexopoulos, Emmanouil Vasilomanolakis, Stephane Le Roux et al.
Sophisticated mass attacks, especially when exploiting zero-day vulnerabilities, have the potential to cause destructive damage to organizations and critical infrastructure. To timely detect and contain such attacks, collaboration among the defenders is critical. By correlating real-time detection information (alerts) from multiple sources (collaborative intrusion detection), defenders can detect attacks and take the appropriate defensive measures in time. However, although the technical tools to facilitate collaboration exist, real-world adoption of such collaborative security mechanisms is still underwhelming. This is largely due to a lack of trust and participation incentives for companies and organizations. This paper proposes TRIDEnT, a novel collaborative platform that aims to enable and incentivize parties to exchange network alert data, thus increasing their overall detection capabilities. TRIDEnT allows parties that may be in a competitive relationship, to selectively advertise, sell and acquire security alerts in the form of (near) real-time peer-to-peer streams. To validate the basic principles behind TRIDEnT, we present an intuitive game-theoretic model of alert sharing, that is of independent interest, and show that collaboration is bound to take place infinitely often. Furthermore, to demonstrate the feasibility of our approach, we instantiate our design in a decentralized manner using Ethereum smart contracts and provide a fully functional prototype.
CRJan 17, 2018
M-STAR: A Modular, Evidence-based Software Trustworthiness FrameworkNikolaos Alexopoulos, Sheikh Mahbub Habib, Steffen Schulz et al.
Despite years of intensive research in the field of software vulnerabilities discovery, exploits are becoming ever more common. Consequently, it is more necessary than ever to choose software configurations that minimize systems' exposure surface to these threats. In order to support users in assessing the security risks induced by their software configurations and in making informed decisions, we introduce M-STAR, a Modular Software Trustworthiness ARchitecture and framework for probabilistically assessing the trustworthiness of software systems, based on evidence, such as their vulnerability history and source code properties. Integral to M-STAR is a software trustworthiness model, consistent with the concept of computational trust. Computational trust models are rooted in Bayesian probability and Dempster-Shafer Belief theory, offering mathematical soundness and expressiveness to our framework. To evaluate our framework, we instantiate M-STAR for Debian Linux packages, and investigate real-world deployment scenarios. In our experiments with real-world data, M-STAR could assess the relative trustworthiness of complete software configurations with an error of less than 10%. Due to its modular design, our proposed framework is agile, as it can incorporate future advances in the field of code analysis and vulnerability prediction. Our results point out that M-STAR can be a valuable tool for system administrators, regular users and developers, helping them assess and manage risks associated with their software configurations.
CRNov 13, 2017
Beyond the Hype: On Using Blockchains in Trust Management for AuthenticationNikolaos Alexopoulos, Jörg Daubert, Max Mühlhäuser et al.
Trust Management (TM) systems for authentication are vital to the security of online interactions, which are ubiquitous in our everyday lives. Various systems, like the Web PKI (X.509) and PGP's Web of Trust are used to manage trust in this setting. In recent years, blockchain technology has been introduced as a panacea to our security problems, including that of authentication, without sufficient reasoning, as to its merits.In this work, we investigate the merits of using open distributed ledgers (ODLs), such as the one implemented by blockchain technology, for securing TM systems for authentication. We formally model such systems, and explore how blockchain can help mitigate attacks against them. After formal argumentation, we conclude that in the context of Trust Management for authentication, blockchain technology, and ODLs in general, can offer considerable advantages compared to previous approaches. Our analysis is, to the best of our knowledge, the first to formally model and argue about the security of TM systems for authentication, based on blockchain technology. To achieve this result, we first provide an abstract model for TM systems for authentication. Then, we show how this model can be conceptually encoded in a blockchain, by expressing it as a series of state transitions. As a next step, we examine five prevalent attacks on TM systems, and provide evidence that blockchain-based solutions can be beneficial to the security of such systems, by mitigating, or completely negating such attacks.