59.9CRApr 22
Hidden Secrets in the arXiv: Discovering, Analyzing, and Preventing Unintentional Information Disclosure in Source Files of Scientific PreprintsJan Pennekamp, Johannes Lohmöller, David Schütte et al.
Preprints are essential for the timely and open dissemination of research. arXiv, the most widely used preprint service, takes the idea of open science one step further by not only publishing the actual preprints but also LaTeX sources and other files used to create them. As known from other contexts, such as GitHub repositories, and anecdotally exemplified for arXiv, making source code publicly available risks disclosing otherwise "hidden" information. Consequently, the public availability of paper sources raises the question of how much sensitive content is (unintentionally) disclosed through them. In this paper, we systematically answer this question for all 2.7M arXiv submissions with available source files across three dimensions of source file-induced information disclosure: (1) inclusion of unnecessary files, (2) metadata embedded in files, and (3) irrelevant content in files such as source code comments. Our analysis reveals that nearly every arXiv submission contains some form of "hidden" information. Notable findings range from links to editable web documents for internal coordination over API and private keys to complete Git histories. While different tools promise to remove such information from source files, we show that they fail to reliably achieve the intended cleaning functionality. To mitigate this situation, we provide ALC-NG to comprehensively remove files, metadata, and comments that are not needed to compile a LaTeX paper.
CRDec 2, 2024
PASTA-4-PHT: A Pipeline for Automated Security and Technical Audits for the Personal Health TrainSascha Welten, Karl Kindermann, Ahmet Polat et al.
With the introduction of data protection regulations, the need for innovative privacy-preserving approaches to process and analyse sensitive data has become apparent. One approach is the Personal Health Train (PHT) that brings analysis code to the data and conducts the data processing at the data premises. However, despite its demonstrated success in various studies, the execution of external code in sensitive environments, such as hospitals, introduces new research challenges because the interactions of the code with sensitive data are often incomprehensible and lack transparency. These interactions raise concerns about potential effects on the data and increases the risk of data breaches. To address this issue, this work discusses a PHT-aligned security and audit pipeline inspired by DevSecOps principles. The automated pipeline incorporates multiple phases that detect vulnerabilities. To thoroughly study its versatility, we evaluate this pipeline in two ways. First, we deliberately introduce vulnerabilities into a PHT. Second, we apply our pipeline to five real-world PHTs, which have been utilised in real-world studies, to audit them for potential vulnerabilities. Our evaluation demonstrates that our designed pipeline successfully identifies potential vulnerabilities and can be applied to real-world studies. In compliance with the requirements of the GDPR for data management, documentation, and protection, our automated approach supports researchers using in their data-intensive work and reduces manual overhead. It can be used as a decision-making tool to assess and document potential vulnerabilities in code for data processing. Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.
CROct 26, 2020
Easing the Conscience with OPC UA: An Internet-Wide Study on Insecure DeploymentsMarkus Dahlmanns, Johannes Lohmöller, Ina Berenice Fink et al.
Due to increasing digitalization, formerly isolated industrial networks, e.g., for factory and process automation, move closer and closer to the Internet, mandating secure communication. However, securely setting up OPC UA, the prime candidate for secure industrial communication, is challenging due to a large variety of insecure options. To study whether Internet-facing OPC UA appliances are configured securely, we actively scan the IPv4 address space for publicly reachable OPC UA systems and assess the security of their configurations. We observe problematic security configurations such as missing access control (on 24% of hosts), disabled security functionality (24%), or use of deprecated cryptographic primitives (25%) on in total 92% of the reachable deployments. Furthermore, we discover several hundred devices in multiple autonomous systems sharing the same security certificate, opening the door for impersonation attacks. Overall, in this paper, we highlight commonly found security misconfigurations and underline the importance of appropriate configuration for security-featuring protocols.