Marc Ohm

3papers

305citations

Novelty43%

AI Score48

Ranked #53,782 of 201,326 authors (top 27%)#1,108 in CR (top 15%)

3 Papers

CRMay 20Code

How Reliable Are FOSS Popularity Metrics? Analyzing the Effort Required for Spoofing Common Software Popularity Metrics

Ben Swierzy, Timo Pohl, Marc Ohm et al.

Quantitative metrics derived from software repositories and package ecosystems are widely used to assess the impact, popularity, maintenance, and criticality of free and open source software (FOSS) projects. However, these metrics are often assumed to be reliable despite their potential susceptibility to manipulation. Prior empirical software engineering and security research deployed these in a variety of ways which assume they indeed capture project impact and popularity. Yet, the extent to which these underlying signals can be spoofed in practice, and the consequences this has for downstream uses of the metrics, has received little focused attention. To address this gap, the paper decomposes existing combined metrics into atomic metric categories, analyzes their spoofing effort under a maintainer-centered threat model, and investigates a real-world sybil attack on npm connected to an impact-based reward mechanism. The analysis finds that many metric categories, especially commit data, issue-tracker activity, downloads, repository contents, and dependency relations, are manipulable with low to moderate effort, and it identifies a sybil attack comprising more than 70,000 spam packages on npm. These results imply that quantitative FOSS metrics should be used with much greater caution in software engineering research and practice, particularly for ranking, dataset construction, and any allocation or evaluation process that turns metrics into optimization targets.

CRMay 19, 2020Code

Backstabber's Knife Collection: A Review of Open Source Software Supply Chain Attacks

Marc Ohm, Henrik Plate, Arnold Sykosch et al.

A software supply chain attack is characterized by the injection of malicious code into a software package in order to compromise dependent systems further down the chain. Recent years saw a number of supply chain attacks that leverage the increasing use of open source during software development, which is facilitated by dependency managers that automatically resolve, download and install hundreds of open source packages throughout the software life cycle. This paper presents a dataset of 174 malicious software packages that were used in real-world attacks on open source software supply chains, and which were distributed via the popular package repositories npm, PyPI, and RubyGems. Those packages, dating from November 2015 to November 2019, were manually collected and analyzed. The paper also presents two general attack trees to provide a structured overview about techniques to inject malicious code into the dependency tree of downstream users, and to execute such code at different times and under different conditions. This work is meant to facilitate the future development of preventive and detective safeguards by open source and research communities.

CRNov 4, 2020

Supporting the Detection of Software Supply Chain Attacks through Unsupervised Signature Generation

Marc Ohm, Lukas Kempf, Felix Boes et al.

Trojanized software packages used in software supply chain attacks constitute an emerging threat. Unfortunately, there is still a lack of scalable approaches that allow automated and timely detection of malicious software packages and thus most detections are based on manual labor and expertise. However, it has been observed that most attack campaigns comprise multiple packages that share the same or similar malicious code. We leverage that fact to automatically reproduce manually identified clusters of known malicious packages that have been used in real world attacks, thus, reducing the need for expert knowledge and manual inspection. Our approach, AST Clustering using MCL to mimic Expertise (ACME), yields promising results with a $F_{1}$ score of 0.99. Signatures are automatically generated based on characteristic code fragments from clusters and are subsequently used to scan the whole npm registry for unreported malicious packages. We are able to identify and report six malicious packages that have been removed from npm consequentially. Therefore, our approach can support analysts by reducing manual labor and hence may be employed to timely detect possible software supply chain attacks.