SECRJul 27, 2021

A Large-Scale Security-Oriented Static Analysis of Python Packages in PyPI

arXiv:2107.12699v237 citationsHas Code
Originality Synthesis-oriented
AI Analysis

This study addresses security vulnerabilities in open-source Python packages for developers and users, but it is incremental as it applies existing static analysis methods to a new dataset.

The paper conducted a large-scale static analysis of Python packages in PyPI, finding that about 46% of packages had at least one security issue, with over 749,000 issues identified across 197,000 packages.

Different security issues are a common problem for open source packages archived to and delivered through software ecosystems. These often manifest themselves as software weaknesses that may lead to concrete software vulnerabilities. This paper examines various security issues in Python packages with static analysis. The dataset is based on a snapshot of all packages stored to the Python Package Index (PyPI). In total, over 197 thousand packages and over 749 thousand security issues are covered. Even under the constraints imposed by static analysis, (a) the results indicate prevalence of security issues; at least one issue is present for about 46% of the Python packages. In terms of the issue types, (b) exception handling and different code injections have been the most common issues. The subprocess module stands out in this regard. Reflecting the generally small size of the packages, (c) software size metrics do not predict well the amount of issues revealed through static analysis. With these results and the accompanying discussion, the paper contributes to the field of large-scale empirical studies for better understanding security problems in software ecosystems.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes