SEMar 2, 2021

Apples, Oranges & Fruits -- Understanding Similarity of Software Projects Through The Lens of Dissimilar Artifacts

arXiv:2103.01475v1Has Code
AI Analysis

This addresses the challenge for developers in reusing open-source software by providing a novel perspective on similarity, though it appears incremental as it builds on existing artifact comparison methods.

The paper tackled the problem of understanding software project similarity by exploring whether similarity can be found through dissimilar artifacts like documentation, commits, and source code, and observed similarities between these artifacts in both similar and different repositories.

The growing availability of open source projects has facilitated developers to reuse existing software artifacts and leverage them to develop new software. However, it is hard to understand the notion of similarity as it varies from developer to developer. Some developers might search for repositories with similar source code, while some might be in search of repositories with similar requirements or issues. Existing approaches tend to find similar projects by comparing similar artifacts such as source-code to source-code, API usage to API usage, documentation to documentation, and so on. Even though there is a dissimilarity between two similar artifacts, there could be a similarity between two dissimilar artifacts. Hence, in this paper, we aim to answer the question - Can we find similarity of software repositories through dissimilar artifacts?. To this end, we conduct an experiment to find similarities between three repositories, two similar and one different project comparing similar and dissimilar artifacts (documentation, commits, and source-code). We observed similarities between dissimilar artifacts such as Commits, Source Code, and Readme Files in the context of both similar and different repositories.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes