How Reliable Are FOSS Popularity Metrics? Analyzing the Effort Required for Spoofing Common Software Popularity Metrics
For software engineering researchers and practitioners relying on FOSS metrics for ranking, dataset construction, or reward allocation, the paper shows that these metrics are easily manipulated, undermining their reliability.
The paper analyzes the spoofing effort required to manipulate common FOSS popularity metrics, finding that many categories (e.g., commit data, downloads, dependency relations) are manipulable with low to moderate effort, and identifies a sybil attack of over 70,000 spam packages on npm. The results suggest these metrics should be used with greater caution.
Quantitative metrics derived from software repositories and package ecosystems are widely used to assess the impact, popularity, maintenance, and criticality of free and open source software (FOSS) projects. However, these metrics are often assumed to be reliable despite their potential susceptibility to manipulation. Prior empirical software engineering and security research deployed these in a variety of ways which assume they indeed capture project impact and popularity. Yet, the extent to which these underlying signals can be spoofed in practice, and the consequences this has for downstream uses of the metrics, has received little focused attention. To address this gap, the paper decomposes existing combined metrics into atomic metric categories, analyzes their spoofing effort under a maintainer-centered threat model, and investigates a real-world sybil attack on npm connected to an impact-based reward mechanism. The analysis finds that many metric categories, especially commit data, issue-tracker activity, downloads, repository contents, and dependency relations, are manipulable with low to moderate effort, and it identifies a sybil attack comprising more than 70,000 spam packages on npm. These results imply that quantitative FOSS metrics should be used with much greater caution in software engineering research and practice, particularly for ranking, dataset construction, and any allocation or evaluation process that turns metrics into optimization targets.