SEJul 27, 2021Code
A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPIJukka Ruohonen, Kalle Hjerppe, Kalle Rindell
Different security issues are a common problem for open source packages archived to and delivered through software ecosystems. These often manifest themselves as software weaknesses that may lead to concrete software vulnerabilities. This paper examines various security issues in Python packages with static analysis. The dataset is based on a snapshot of all packages stored to the Python Package Index (PyPI). In total, over 197 thousand packages and over 749 thousand security issues are covered. Even under the constraints imposed by static analysis, (a) the results indicate prevalence of security issues; at least one issue is present for about 46% of the Python packages. In terms of the issue types, (b) exception handling and different code injections have been the most common issues. The subprocess module stands out in this regard. Reflecting the generally small size of the packages, (c) software size metrics do not predict well the amount of issues revealed through static analysis. With these results and the accompanying discussion, the paper contributes to the field of large-scale empirical studies for better understanding security problems in software ecosystems.
SEApr 30, 2020
Extracting Layered Privacy Language Purposes from Web ServicesKalle Hjerppe, Jukka Ruohonen, Ville Leppänen
Web services are important in the processing of personal data in the World Wide Web. In light of recent data protection regulations, this processing raises a question about consent or other basis of legal processing. While a consent must be informed, many web services fail to provide enough information for users to make informed decisions. Privacy policies and privacy languages are one way for addressing this problem; the former document how personal data is processed, while the latter describe this processing formally. In this paper, the socalled Layered Privacy Language (LPL) is coupled with web services in order to express personal data processing with a formal analysis method that seeks to generate the processing purposes for privacy policies. To this end, the paper reviews the background theory as well as proposes a method and a concrete tool. The results are demonstrated with a small case study.
SEMar 22, 2020
Annotation-Based Static Analysis for Personal Data ProtectionKalle Hjerppe, Jukka Ruohonen, Ville Leppänen
This paper elaborates the use of static source code analysis in the context of data protection. The topic is important for software engineering in order for software developers to improve the protection of personal data during software development. To this end, the paper proposes a design of annotating classes and functions that process personal data. The design serves two primary purposes: on one hand, it provides means for software developers to document their intent; on the other hand, it furnishes tools for automatic detection of potential violations. This dual rationale facilitates compliance with the General Data Protection Regulation (GDPR) and other emerging data protection and privacy regulations. In addition to a brief review of the state-of-the-art of static analysis in the data protection context and the design of the proposed analysis method, a concrete tool is presented to demonstrate a practical implementation for the Java programming language.
SEJul 17, 2019
The General Data Protection Regulation: Requirements, Architectures, and ConstraintsKalle Hjerppe, Jukka Ruohonen, Ville Leppänen
The General Data Protection Regulation (GDPR) in the European Union is the most famous recently enacted privacy regulation. Despite of the regulation's legal, political, and technological ramifications, relatively little research has been carried out for better understanding the GDPR's practical implications for requirements engineering and software architectures. Building on a grounded theory approach with close ties to the Finnish software industry, this paper contributes to the sealing of this gap in previous research. Three questions are asked and answered in the context of software development organizations. First, the paper elaborates nine practical constraints under which many small and medium-sized enterprises (SMEs) often operate when implementing solutions that address the new regulatory demands. Second, the paper elicits nine regulatory requirements from the GDPR for software architectures. Third, the paper presents an implementation for a software architecture that complies both with the requirements elicited and the constraints elaborated.