Jan Pennekamp

CR
10papers
118citations
Novelty41%
AI Score42

10 Papers

59.9CRApr 22
Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific Preprints

Jan Pennekamp, Johannes Lohmöller, David Schütte et al.

Preprints are essential for the timely and open dissemination of research. arXiv, the most widely used preprint service, takes the idea of open science one step further by not only publishing the actual preprints but also LaTeX sources and other files used to create them. As known from other contexts, such as GitHub repositories, and anecdotally exemplified for arXiv, making source code publicly available risks disclosing otherwise "hidden" information. Consequently, the public availability of paper sources raises the question of how much sensitive content is (unintentionally) disclosed through them. In this paper, we systematically answer this question for all 2.7M arXiv submissions with available source files across three dimensions of source file-induced information disclosure: (1) inclusion of unnecessary files, (2) metadata embedded in files, and (3) irrelevant content in files such as source code comments. Our analysis reveals that nearly every arXiv submission contains some form of "hidden" information. Notable findings range from links to editable web documents for internal coordination over API and private keys to complete Git histories. While different tools promise to remove such information from source files, we show that they fail to reliably achieve the intended cleaning functionality. To mitigate this situation, we provide ALC-NG to comprehensively remove files, metadata, and comments that are not needed to compile a LaTeX paper.

CRDec 2, 2024
PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health Train

Sascha Welten, Karl Kindermann, Ahmet Polat et al.

With the introduction of data protection regulations, the need for innovative privacy-preserving approaches to process and analyse sensitive data has become apparent. One approach is the Personal Health Train (PHT) that brings analysis code to the data and conducts the data processing at the data premises. However, despite its demonstrated success in various studies, the execution of external code in sensitive environments, such as hospitals, introduces new research challenges because the interactions of the code with sensitive data are often incomprehensible and lack transparency. These interactions raise concerns about potential effects on the data and increases the risk of data breaches. To address this issue, this work discusses a PHT-aligned security and audit pipeline inspired by DevSecOps principles. The automated pipeline incorporates multiple phases that detect vulnerabilities. To thoroughly study its versatility, we evaluate this pipeline in two ways. First, we deliberately introduce vulnerabilities into a PHT. Second, we apply our pipeline to five real-world PHTs, which have been utilised in real-world studies, to audit them for potential vulnerabilities. Our evaluation demonstrates that our designed pipeline successfully identifies potential vulnerabilities and can be applied to real-world studies. In compliance with the requirements of the GDPR for data management, documentation, and protection, our automated approach supports researchers using in their data-intensive work and reduces manual overhead. It can be used as a decision-making tool to assess and document potential vulnerabilities in code for data processing. Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.

CRMar 6
Supporting Artifact Evaluation with LLMs: A Study with Published Security Research Papers

David Heye, Karl Kindermann, Robin Decker et al.

Artifact Evaluation (AE) is essential for ensuring the transparency and reliability of research, closing the gap between exploratory work and real-world deployment is particularly important in cybersecurity, particularly in IoT and CPSs, where large-scale, heterogeneous, and privacy-sensitive data meet safety-critical actuation. Yet, manual reproducibility checks are time-consuming and do not scale with growing submission volumes. In this work, we demonstrate that Large Language Models (LLMs) can provide powerful support for AE tasks: (i) text-based reproducibility rating, (ii) autonomous sandboxed execution environment preparation, and (iii) assessment of methodological pitfalls. Our reproducibility-assessment toolkit yields an accuracy of over 72% and autonomously sets up execution environments for 28% of runnable cybersecurity artifacts. Our automated pitfall assessment detects seven prevalent pitfalls with high accuracy ($F_1$ > 92%). Hence, the toolkit significantly reduces reviewer effort and, when integrated into established AE processes, could incentivize authors to submit higher-quality and more reproducible artifacts. IoT, CPS, and cybersecurity conferences and workshops may integrate the toolkit into their peer-review processes to support reviewers' decisions on awarding artifact badges, improving the overall sustainability of the process.

CRMay 18, 2022
A False Sense of Security? Revisiting the State of Machine Learning-Based Industrial Intrusion Detection

Dominik Kus, Eric Wagner, Jan Pennekamp et al.

Anomaly-based intrusion detection promises to detect novel or unknown attacks on industrial control systems by modeling expected system behavior and raising corresponding alarms for any deviations.As manually creating these behavioral models is tedious and error-prone, research focuses on machine learning to train them automatically, achieving detection rates upwards of 99%. However, these approaches are typically trained not only on benign traffic but also on attacks and then evaluated against the same type of attack used for training. Hence, their actual, real-world performance on unknown (not trained on) attacks remains unclear. In turn, the reported near-perfect detection rates of machine learning-based intrusion detection might create a false sense of security. To assess this situation and clarify the real potential of machine learning-based industrial intrusion detection, we develop an evaluation methodology and examine multiple approaches from literature for their performance on unknown attacks (excluded from training). Our results highlight an ineffectiveness in detecting unknown attacks, with detection rates dropping to between 3.2% and 14.7% for some types of attacks. Moving forward, we derive recommendations for further research on machine learning-based approaches to ensure clarity on their ability to detect unknown attacks.

CRDec 21, 2021
Collaboration is not Evil: A Systematic Look at Security Research for Industrial Use

Jan Pennekamp, Erik Buchholz, Markus Dahlmanns et al.

Following the recent Internet of Things-induced trends on digitization in general, industrial applications will further evolve as well. With a focus on the domains of manufacturing and production, the Internet of Production pursues the vision of a digitized, globally interconnected, yet secure environment by establishing a distributed knowledge base. Background. As part of our collaborative research of advancing the scope of industrial applications through cybersecurity and privacy, we identified a set of common challenges and pitfalls that surface in such applied interdisciplinary collaborations. Aim. Our goal with this paper is to support researchers in the emerging field of cybersecurity in industrial settings by formalizing our experiences as reference for other research efforts, in industry and academia alike. Method. Based on our experience, we derived a process cycle of performing such interdisciplinary research, from the initial idea to the eventual dissemination and paper writing. This presented methodology strives to successfully bootstrap further research and to encourage further work in this emerging area. Results. Apart from our newly proposed process cycle, we report on our experiences and conduct a case study applying this methodology, raising awareness for challenges in cybersecurity research for industrial applications. We further detail the interplay between our process cycle and the data lifecycle in applied research data management. Finally, we augment our discussion with an industrial as well as an academic view on this research area and highlight that both areas still have to overcome significant challenges to sustainably and securely advance industrial applications. Conclusions. With our proposed process cycle for interdisciplinary research in the intersection of cybersecurity and industrial application, we provide a foundation for further research.

CRNov 26, 2021
CoinPrune: Shrinking Bitcoin's Blockchain Retrospectively

Roman Matzutt, Benedikt Kalde, Jan Pennekamp et al.

Popular cryptocurrencies continue to face serious scalability issues due to their ever-growing blockchains. Thus, modern blockchain designs began to prune old blocks and rely on recent snapshots for their bootstrapping processes instead. Unfortunately, established systems are often considered incapable of adopting these improvements. In this work, we present CoinPrune, our block-pruning scheme with full Bitcoin compatibility, to revise this popular belief. CoinPrune bootstraps joining nodes via snapshots that are periodically created from Bitcoin's set of unspent transaction outputs (UTXO set). Our scheme establishes trust in these snapshots by relying on CoinPrune-supporting miners to mutually reaffirm a snapshot's correctness on the blockchain. This way, snapshots remain trustworthy even if adversaries attempt to tamper with them. Our scheme maintains its retrospective deployability by relying on positive feedback only, i.e., blocks containing invalid reaffirmations are not rejected, but invalid reaffirmations are outpaced by the benign ones created by an honest majority among CoinPrune-supporting miners. Already today, CoinPrune reduces the storage requirements for Bitcoin nodes by two orders of magnitude, as joining nodes need to fetch and process only 6 GiB instead of 271 GiB of data in our evaluation, reducing the synchronization time of powerful devices from currently 7 h to 51 min, with even larger potential drops for less powerful devices. CoinPrune is further aware of higher-level application data, i.e., it conserves otherwise pruned application data and allows nodes to obfuscate objectionable and potentially illegal blockchain content from their UTXO set and the snapshots they distribute.

CROct 26, 2020
Easing the Conscience with OPC UA: An Internet-Wide Study on Insecure Deployments

Markus Dahlmanns, Johannes Lohmöller, Ina Berenice Fink et al.

Due to increasing digitalization, formerly isolated industrial networks, e.g., for factory and process automation, move closer and closer to the Internet, mandating secure communication. However, securely setting up OPC UA, the prime candidate for secure industrial communication, is challenging due to a large variety of insecure options. To study whether Internet-facing OPC UA appliances are configured securely, we actively scan the IPv4 address space for publicly reachable OPC UA systems and assess the security of their configurations. We observe problematic security configurations such as missing access control (on 24% of hosts), disabled security functionality (24%), or use of deprecated cryptographic primitives (25%) on in total 92% of the reachable deployments. Furthermore, we discover several hundred devices in multiple autonomous systems sharing the same security certificate, opening the door for impersonation attacks. Overall, in this paper, we highlight commonly found security misconfigurations and underline the importance of appropriate configuration for security-featuring protocols.

CRApr 15, 2020
How to Securely Prune Bitcoin's Blockchain

Roman Matzutt, Benedikt Kalde, Jan Pennekamp et al.

Bitcoin was the first successful decentralized cryptocurrency and remains the most popular of its kind to this day. Despite the benefits of its blockchain, Bitcoin still faces serious scalability issues, most importantly its ever-increasing blockchain size. While alternative designs introduced schemes to periodically create snapshots and thereafter prune older blocks, already-deployed systems such as Bitcoin are often considered incapable of adopting corresponding approaches. In this work, we revise this popular belief and present CoinPrune, a snapshot-based pruning scheme that is fully compatible with Bitcoin. CoinPrune can be deployed through an opt-in velvet fork, i.e., without impeding the established Bitcoin network. By requiring miners to publicly announce and jointly reaffirm recent snapshots on the blockchain, CoinPrune establishes trust into the snapshots' correctness even in the presence of powerful adversaries. Our evaluation shows that CoinPrune reduces the storage requirements of Bitcoin already by two orders of magnitude today, with further relative savings as the blockchain grows. In our experiments, nodes only have to fetch and process 5 GiB instead of 230 GiB of data when joining the network, reducing the synchronization time on powerful devices from currently 5 h to 46 min, with even more savings for less powerful devices.

CRApr 14, 2020
Utilizing Public Blockchains for the Sybil-Resistant Bootstrapping of Distributed Anonymity Services

Roman Matzutt, Jan Pennekamp, Erik Buchholz et al.

Distributed anonymity services, such as onion routing networks or cryptocurrency tumblers, promise privacy protection without trusted third parties. While the security of these services is often well-researched, security implications of their required bootstrapping processes are usually neglected: Users either jointly conduct the anonymization themselves, or they need to rely on a set of non-colluding privacy peers. However, the typically small number of privacy peers enable single adversaries to mimic distributed services. We thus present AnonBoot, a Sybil-resistant medium to securely bootstrap distributed anonymity services via public blockchains. AnonBoot enforces that peers periodically create a small proof of work to refresh their eligibility for providing secure anonymity services. A pseudo-random, locally replicable bootstrapping process using on-chain entropy then prevents biasing the election of eligible peers. Our evaluation using Bitcoin as AnonBoot's underlying blockchain shows its feasibility to maintain a trustworthy repository of 1000 peers with only a small storage footprint while supporting arbitrarily large user bases on top of most blockchains.

CRMar 27, 2020
Assessing the Security of OPC UA Deployments

Linus Roepert, Markus Dahlmanns, Ina Berenice Fink et al.

To address the increasing security demands of industrial deployments, OPC UA is one of the first industrial protocols explicitly designed with security in mind. However, deploying it securely requires a thorough configuration of a wide range of options. Thus, assessing the security of OPC UA deployments and their configuration is necessary to ensure secure operation, most importantly confidentiality and integrity of industrial processes. In this work, we present extensions to the popular Metasploit Framework to ease network-based security assessments of OPC UA deployments. To this end, we discuss methods to discover OPC UA servers, test their authentication, obtain their configuration, and check for vulnerabilities. Ultimately, our work enables operators to verify the (security) configuration of their systems and identify potential attack vectors.