Adam Tutko

h-index2

3papers

65citations

Novelty32%

AI Score22

Ranked #177,491 of 194,257 authors (top 91%)#2,221 in SE (top 73%)

3 Papers

3.6SEMar 23, 2021Code

Tracing Vulnerable Code Lineage

David Reid, Kalvin Eng, Chris Bogart et al.

This paper presents results from the MSR 2021 Hackathon. Our team investigates files/projects that contain known security vulnerabilities and how widespread they are throughout repositories in open source software. These security vulnerabilities can potentially be propagated through code reuse even when the vulnerability is fixed in different versions of the code. We utilize the World of Code infrastructure to discover file-level duplication of code from a nearly complete collection of open source software. This paper describes a method and set of tools to find all open source projects that use known vulnerable files and any previous revisions of those files.

15.7SEOct 30, 2020

World of Code: Enabling a Research Workflow for Mining and Analyzing the Universe of Open Source VCS data

Yuxing Ma, Tapajit Dey, Chris Bogart et al.

Open source software (OSS) is essential for modern society and, while substantial research has been done on individual (typically central) projects, only a limited understanding of the periphery of the entire OSS ecosystem exists. For example, how are the tens of millions of projects in the periphery interconnected through. technical dependencies, code sharing, or knowledge flow? To answer such questions we: a) create a very large and frequently updated collection of version control data in the entire FLOSS ecosystems named World of Code (WoC), that can completely cross-reference authors, projects, commits, blobs, dependencies, and history of the FLOSS ecosystems and b) provide capabilities to efficiently correct, augment, query, and analyze that data. Our current WoC implementation is capable of being updated on a monthly basis and contains over 18B Git objects. To evaluate its research potential and to create vignettes for its usage, we employ WoC in conducting several research tasks. In particular, we find that it is capable of supporting trend evaluation, ecosystem measurement, and the determination of package usage. We expect WoC to spur investigation into global properties of OSS development leading to increased resiliency of the entire OSS ecosystem. Our infrastructure facilitates the discovery of key technical dependencies, code flow, and social networks that provide the basis to determine the structure and evolution of the relationships that drive FLOSS activities and innovation.

5.3SEAug 8, 2020

More Effective Software Repository Mining

Adam Tutko, Austin Henley, Audris Mockus

Background: Data mining and analyzing of public Git software repositories is a growing research field. The tools used for studies that investigate a single project or a group of projects have been refined, but it is not clear whether the results obtained on such ``convenience samples'' generalize. Aims: This paper aims to elucidate the difficulties faced by researchers who would like to ascertain the generalizability of their findings by introducing an interface that addresses the issues with obtaining representative samples. Results: To do that we explore how to exploit the World of Code system to make software repository sampling and analysis much more accessible. Specifically, we present a resource for Mining Software Repository researchers that is intended to simplify data sampling and retrieval workflow and, through that, increase the validity and completeness of data. Conclusions: This system has the potential to provide researchers a resource that greatly eases the difficulty of data retrieval and addresses many of the currently standing issues with data sampling.