SIJul 14, 2023
Are Large Language Models a Threat to Digital Public Goods? Evidence from Activity on Stack OverflowMaria del Rio-Chanona, Nadzeya Laurentsyeva, Johannes Wachs
Large language models like ChatGPT efficiently provide users with information about various topics, presenting a potential substitute for searching the web and asking people for help online. But since users interact privately with the model, these models may drastically reduce the amount of publicly available human-generated data and knowledge resources. This substitution can present a significant problem in securing training data for future models. In this work, we investigate how the release of ChatGPT changed human-generated open data on the web by analyzing the activity on Stack Overflow, the leading online Q\&A platform for computer programming. We find that relative to its Russian and Chinese counterparts, where access to ChatGPT is limited, and to similar forums for mathematics, where ChatGPT is less capable, activity on Stack Overflow significantly decreased. A difference-in-differences model estimates a 16\% decrease in weekly posts on Stack Overflow. This effect increases in magnitude over time, and is larger for posts related to the most widely used programming languages. Posts made after ChatGPT get similar voting scores than before, suggesting that ChatGPT is not merely displacing duplicate or low-quality content. These results suggest that more users are adopting large language models to answer questions and they are better substitutes for Stack Overflow for languages for which they have more training data. Using models like ChatGPT may be more efficient for solving certain programming problems, but its widespread adoption and the resulting shift away from public exchange on the web will limit the open data people and models can learn from in the future.
37.5GNApr 8
Using digital traces to analyze software work: skills, careers and programming languagesXiangnan Feng, Johannes Wachs, Simone Daniotti et al.
Recent waves of technological transformation are reshaping work in uncertain and hard-to-predict ways. However, jobs at the forefront of the digitizing economy offer an early glimpse of these changes and leave rich activity traces. We exploit such traces in tens of millions of Question and Answer posts on Stack Overflow for the creation of a fine-grained taxonomy of software skills to analyze human capital in the global software industry. Constructing a software skill space that maps relations among these skills reveals that real-world software jobs demand highly coherent skill sets and that programmers learn through a process of related diversification. The latter process often leads to the acquisition of lower-value skills. However, when programmers use Python they preferentially target higher-value skills, offering a potential explanation for Python's successful rise as a dominant general purpose language.
78.1SIApr 11
Good Question! The Effect of Positive Feedback on Contributions to Online Public GoodsJohannes Wachs, Leonore Röseler, Tobias Gesche et al.
Online platforms where volunteers answer each other's questions are important sources of knowledge, yet participation is declining. We ran a pre-registered experiment on Stack Overflow, one of the largest Q&A communities for software development (N = 22,856), randomly assigning newly posted questions to receive an anonymous upvote. Within four weeks, treated users were 6.3% more likely to ask another question and 12.9% more likely to answer someone else's question. A second upvote produced no additional effect. The effect on answering was larger, more persistent, and still significant at twelve weeks. Next, we examine how much of these effects are due to algorithmic amplification, since upvotes also raise a question's rank and visibility. Algorithmic amplification is not important for the effect on asking additional questions, but it matters a lot for the effect on answering other questions. The increase in visibility increases the probability that another user provides an answer, and that experience appears to shift the poster toward broader community participation.
SEMar 31, 2021
Mining DEV for social and technical insights about software developmentMaria Papoutsoglou, Johannes Wachs, Georgia M. Kapitsaki
Software developers are social creatures: they communicate, collaborate, and promote their work in a variety of channels. Twitter, GitHub, Stack Overflow, and other platforms offer developers opportunities to network and exchange ideas. Researchers analyze content on these sites to learn about trends and topics in software engineering. However, insight mined from the text of Stack Overflow questions or GitHub issues is highly focused on detailed and technical aspects of software development. In this paper, we present a relatively new online community for software developers called DEV. On DEV users write long-form posts about their experiences, preferences, and working life in software, zooming out from specific issues and files to reflect on broader topics. About 50,000 users have posted over 140,000 articles related to software development. In this work, we describe the content of posts on DEV using a topic model, showing that developers discuss a rich variety and mixture of social and technical aspects of software development. We show that developers use DEV to promote themselves and their work: 83% link their profiles to their GitHub profiles and 56% to their Twitter profiles. 14% of users pin specific GitHub repos in their profiles. We argue that DEV is emerging as an important hub for software developers, and a valuable source of insight for researchers to complement data from platforms like GitHub and Stack Overflow.
SEJun 3, 2020
How Gamification Affects Software Developers: Cautionary Evidence from a Natural Experiment on GitHubLukas Moldon, Markus Strohmaier, Johannes Wachs
We examine how the behavior of software developers changes in response to removing gamification elements from GitHub, an online platform for collaborative programming and software development. We find that the unannounced removal of daily activity streak counters from the user interface (from user profile pages) was followed by significant changes in behavior. Long-running streaks of activity were abandoned and became less common. Weekend activity decreased and days in which developers made a single contribution became less common. Synchronization of streaking behavior in the platform's social network also decreased, suggesting that gamification is a powerful channel for social influence. Focusing on a set of software developers that were publicly pursuing a goal to make contributions for 100 days in a row, we find that some of these developers abandon this quest following the removal of the public streak counter. Our findings provide evidence for the significant impact of gamification on the behavior of developers on large collaborative programming and software development platforms. They urge caution: gamification can steer the behavior of software developers in unexpected and unwanted directions.