The Maven Dependency Graph: a Temporal Graph-based Representation of Maven Central
This provides a resource for researchers and developers to study Java application architecture and evolution, but it is incremental as it repackages existing data into a more accessible format.
The paper tackles the challenge of analyzing the Maven Central Repository, a large dataset of 2.8M Java artifacts with dependencies, by creating the Maven Dependency Graph, an open-source dataset stored in a graph database that explicitly models dependencies and provides query infrastructure.
The Maven Central Repository provides an extraordinary source of data to understand complex architecture and evolution phenomena among Java applications. As of September 6, 2018, this repository includes 2.8M artifacts (compiled piece of code implemented in a JVM-based language), each of which is characterized with metadata such as exact version, date of upload and list of dependencies towards other artifacts. Today, one who wants to analyze the complete ecosystem of Maven artifacts and their dependencies faces two key challenges: (i) this is a huge data set; and (ii) dependency relationships among artifacts are not modeled explicitly and cannot be queried. In this paper, we present the Maven Dependency Graph. This open source data set provides two contributions: a snapshot of the whole Maven Central taken on September 6, 2018, stored in a graph database in which we explicitly model all dependencies; an open source infrastructure to query this huge dataset.