The Software Heritage Filesystem (SwhFS): Integrating Source Code Archival with Development
This provides a practical tool for developers to access and work with archived source code, even after it disappears from original locations, though it is incremental as it builds on existing archival infrastructure.
The authors tackled the problem of accessing archived open source software by introducing the Software Heritage filesystem (SwhFS), which integrates large-scale source code archival with development workflows, allowing developers to quickly 'checkout' any of the 2 billion commits archived without performance costs of cloning.
We introduce the Software Heritage filesystem (SwhFS), a user-space filesystem that integrates large-scale open source software archival with development workflows. SwhFS provides a POSIX filesystem view of Software Heritage, the largest public archive of software source code and version control system (VCS) development history.Using SwhFS, developers can quickly "checkout" any of the 2 billion commits archived by Software Heritage, even after they disappear from their previous known location and without incurring the performance cost of repository cloning. SwhFS works across unrelated repositories and different VCS technologies. Other source code artifacts archived by Software Heritage-individual source code files and trees, releases, and branches-can also be accessed using common programming tools and custom scripts, as if they were locally available.A screencast of SwhFS is available online at dx.doi.org/10.5281/zenodo.4531411.