Code Duplication on Stack Overflow
This addresses a gap in understanding code duplication on a key platform for software developers, though it is incremental as it builds on existing knowledge about code clones.
The paper investigates the prevalence and implications of code duplication on Stack Overflow, finding that clones are common and diverse, with specific challenges such as user incentives for cloning and bulk edit difficulties.
Despite the unarguable importance of Stack Overflow (SO) for the daily work of many software developers and despite existing knowledge about the impact of code duplication on software maintainability, the prevalence and implications of code clones on SO have not yet received the attention they deserve. In this paper, we motivate why studies on code duplication within SO are needed and how existing studies on code reuse differ from this new research direction. We present similarities and differences between code clones in general and code clones on SO and point to open questions that need to be addressed to be able to make data-informed decisions about how to properly handle clones on this important platform. We present results from a first preliminary investigation, indicating that clones on SO are common and diverse. We further point to specific challenges, including incentives for users to clone successful answers and difficulties with bulk edits on the platform, and conclude with possible directions for future work.