Generating GitHub Repository Descriptions: A Comparison of Manual and Automated Approaches
This addresses repository discovery issues for GitHub users, but it is incremental as it builds on existing summarization techniques.
The study tackled the problem of poor GitHub repository descriptions by proposing an LSP template for clear descriptions and comparing automated summarization methods, finding that automated summarization works for default generation but the LSP template is most effective for user communication.
Given the vast number of repositories hosted on GitHub, project discovery and retrieval have become increasingly important for GitHub users. Repository descriptions serve as one of the first points of contact for users who are accessing a repository. However, repository owners often fail to provide a high-quality description; instead, they use vague terms, the purpose of the repository is poorly explained, or the description is omitted entirely. In this work, we examine the current practice of writing GitHub repository descriptions. Our investigation leads to the proposal of the LSP (Language, Software technology, and Purpose) template to formulate good descriptions for GitHub repositories that are clear, concise, and informative. To understand the extent to which current automated techniques can support generating repository descriptions, we compare the performance of state-of-the-art text summarization methods on this task. Finally, our user study with GitHub users reveals that automated summarization can adequately be used for default description generation for GitHub repositories, while the descriptions which follow the LSP template offer the most effective instrument for communicating with GitHub users.