An Empirical Comparison of Developer Retention in the RubyGems and npm Software Ecosystems
This helps software ecosystem maintainers identify developers at risk of leaving to mitigate knowledge loss, though it's an incremental application of survival analysis to package ecosystems.
This paper empirically compares factors leading to developer abandonment in the RubyGems and npm software ecosystems by analyzing socio-technical activity of over 30k RubyGems and 60k npm developers, finding that developers with lower engagement in discussions, weaker activity intensity, less frequent communication/commits, and shorter participation periods have higher abandonment probability.
Software ecosystems can be viewed as socio-technical networks consisting of technical components (software packages) and social components (communities of developers) that maintain the technical components. Ecosystems evolve over time through socio-technical changes that may greatly impact the ecosystem's sustainability. Social changes like developer turnover may lead to technical degradation. This motivates the need to identify those factors leading to developer abandonment, in order to automate the process of identifying developers with high abandonment risk. This paper compares such factors for two software package ecosystems, RubyGems and npm. We analyse the evolution of their packages hosted on GitHub, considering development activity in terms of commits, and social interaction with other developers in terms of comments associated to commits, issues or pull requests. We analyse this socio-technical activity for more than 30k and 60k developers for RubyGems and npm respectively. We use survival analysis to identify which factors coincide with a lower survival probability. Our results reveal that developers with a higher probability to abandon an ecosystem: do not engage in discussions with other developers; do not have strong social and technical activity intensity; communicate or commit less frequently; and do not participate to both technical and social activities for long periods of time. Such observations could be used to automate the identification of developers with a high probability of abandoning the ecosystem and, as such, reduce the risks associated to knowledge loss.