Frank Schweitzer

SE
h-index97
14papers
376citations
Novelty36%
AI Score36

14 Papers

SISep 7, 2022
Reconstructing signed relations from interaction data

Georges Andres, Giona Casiraghi, Giacomo Vaccario et al.

Positive and negative relations play an essential role in human behavior and shape the communities we live in. Despite their importance, data about signed relations is rare and commonly gathered through surveys. Interaction data is more abundant, for instance, in the form of proximity or communication data. So far, though, it could not be utilized to detect signed relations. In this paper, we show how the underlying signed relations can be extracted with such data. Employing a statistical network approach, we construct networks of signed relations in four communities. We then show that these relations correspond to the ones reported in surveys. Additionally, the inferred relations allow us to study the homophily of individuals with respect to gender, religious beliefs, and financial backgrounds. We evaluate the importance of triads in the signed network to study group cohesion.

LGMay 19, 2022
Disentangling Active and Passive Cosponsorship in the U.S. Congress

Giuseppe Russo, Christoph Gote, Laurence Brandenberger et al.

In the U.S. Congress, legislators can use active and passive cosponsorship to support bills. We show that these two types of cosponsorship are driven by two different motivations: the backing of political colleagues and the backing of the bill's content. To this end, we develop an Encoder+RGCN based model that learns legislator representations from bill texts and speech transcripts. These representations predict active and passive cosponsorship with an F1-score of 0.88. Applying our representations to predict voting decisions, we show that they are interpretable and generalize to unseen tasks.

SENov 21, 2019Code
Analysing Time-Stamped Co-Editing Networks in Software Development Teams using git2net

Christoph Gote, Ingo Scholtes, Frank Schweitzer

Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts. Because this neglects detailed information on code changes and code ownership we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. We apply our tool in two case studies using GitHub repositories of multiple Open Source as well as a commercial software project. Specifically, we use data on more than 1.2 million commits and more than 25'000 developers to test a hypothesis on the relation between developer productivity and co-editing patterns in software teams. We argue that git2net opens up a massive new source of high-resolution data on human collaboration patterns that can be used to advance theory in empirical software engineering, computational social science, and organisational studies.

SEMar 25, 2019Code
git2net - Mining Time-Stamped Co-Editing Networks from Large git Repositories

Christoph Gote, Ingo Scholtes, Frank Schweitzer

Data from software repositories have become an important foundation for the empirical study of software engineering processes. A recurring theme in the repository mining literature is the inference of developer networks capturing e.g. collaboration, coordination, or communication from the commit history of projects. Most of the studied networks are based on the co-authorship of software artefacts defined at the level of files, modules, or packages. While this approach has led to insights into the social aspects of software development, it neglects detailed information on code changes and code ownership, e.g. which exact lines of code have been authored by which developers, that is contained in the commit log of software projects. Addressing this issue, we introduce git2net, a scalable python software that facilitates the extraction of fine-grained co-editing networks in large git repositories. It uses text mining techniques to analyse the detailed history of textual modifications within files. This information allows us to construct directed, weighted, and time-stamped networks, where a link signifies that one developer has edited a block of source code originally written by another developer. Our tool is applied in case studies of an Open Source and a commercial software project. We argue that it opens up a massive new source of high-resolution data on human collaboration patterns.

SEJun 23, 2015Code
How do OSS projects change in number and size? A large-scale analysis to test a model of project growth

Frank Schweitzer, Vahan Nanumyan, Claudio J. Tessone et al.

Established Open Source Software (OSS) projects can grow in size if new developers join, but also the number of OSS projects can grow if developers choose to found new projects. We discuss to what extent an established model for firm growth can be applied to the dynamics of OSS projects. Our analysis is based on a large-scale data set from SourceForge (SF) consisting of monthly data for 10 years, for up to 360'000 OSS projects and up to 340'000 developers. Over this time period, we find an exponential growth both in the number of projects and developers, with a remarkable increase of single-developer projects after 2009. We analyze the monthly entry and exit rates for both projects and developers, the growth rate of established projects and the monthly project size distribution. To derive a prediction for the latter, we use modeling assumptions of how newly entering developers choose to either found a new project or to join existing ones. Our model applies only to collaborative projects that are deemed to grow in size by attracting new developers. We verify, by a thorough statistical analysis, that the Yule-Simon distribution is a valid candidate for the size distribution of collaborative projects except for certain time periods where the modeling assumptions no longer hold. We detect and empirically test the reason for this limitation, i.e., the fact that an increasing number of established developers found additional new projects after 2009.

SEJun 15, 2013Code
The Role of Emotions in Contributors Activity: A Case Study on the GENTOO Community

David Garcia, Marcelo Serrano Zanetti, Frank Schweitzer

We analyse the relation between the emotions and the activity of contributors in the Open Source Software project Gentoo. Our case study builds on extensive data sets from the project's bug tracking platform Bugzilla, to quantify the activity of contributors, and its mail archives, to quantify the emotions of contributors by means of sentiment analysis. The Gentoo project is known for a period of centralization within its bug triaging community. This was followed by considerable changes in community organization and performance after the sudden retirement of the central contributor. We analyse how this event correlates with the negative emotions, both in bilateral email discussions with the central contributor, and at the level of the whole community of contributors. We then extend our study to consider the activity patterns on Gentoo contributors in general. We find that contributors are more likely to become inactive when they express strong positive or negative emotions in the bug tracker, or when they deviate from the expected value of emotions in the mailing list. We use these insights to develop a Bayesian classifier that detects the risk of contributors leaving the project. Our analysis opens new perspectives for measuring online contributor motivation by means of sentiment analysis and for real-time predictions of contributor turnover in Open Source Software projects.

SEFeb 28, 2013Code
The Rise and Fall of a Central Contributor: Dynamics of Social Organization and Performance in the Gentoo Community

Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone et al.

Social organization and division of labor crucially influence the performance of collaborative software engineering efforts. In this paper, we provide a quantitative analysis of the relation between social organization and performance in Gentoo, an Open Source community developing a Linux distribution. We study the structure and dynamics of collaborations as recorded in the project's bug tracking system over a period of ten years. We identify a period of increasing centralization after which most interactions in the community were mediated by a single central contributor. In this period of maximum centralization, the central contributor unexpectedly left the project, thus posing a significant challenge for the community. We quantify how the rise, the activity as well as the subsequent sudden dropout of this central contributor affected both the social organization and the bug handling performance of the Gentoo community. We analyze social organization from the perspective of network theory and augment our quantitative findings by interviews with prominent members of the Gentoo community which shared their personal insights.

SEFeb 27, 2013Code
Categorizing Bugs with Social Networks: A Case Study on Four Open Source Software Communities

Marcelo Serrano Zanetti, Ingo Scholtes, Claudio Juan Tessone et al.

Efficient bug triaging procedures are an important precondition for successful collaborative software engineering projects. Triaging bugs can become a laborious task particularly in open source software (OSS) projects with a large base of comparably inexperienced part-time contributors. In this paper, we propose an efficient and practical method to identify valid bug reports which a) refer to an actual software bug, b) are not duplicates and c) contain enough information to be processed right away. Our classification is based on nine measures to quantify the social embeddedness of bug reporters in the collaboration network. We demonstrate its applicability in a case study, using a comprehensive data set of more than 700,000 bug reports obtained from the Bugzilla installation of four major OSS communities, for a period of more than ten years. For those projects that exhibit the lowest fraction of valid bug reports, we find that the bug reporters' position in the collaboration network is a strong indicator for the quality of bug reports. Based on this finding, we develop an automated classification scheme that can easily be integrated into bug tracking platforms and analyze its performance in the considered OSS communities. A support vector machine (SVM) to identify valid bug reports based on the nine measures yields a precision of up to 90.3% with an associated recall of 38.9%. With this, we significantly improve the results obtained in previous case studies for an automated early identification of bugs that are eventually fixed. Furthermore, our study highlights the potential of using quantitative measures of social organization in collaborative software engineering. It also opens a broad perspective for the integration of social awareness in the design of support infrastructures.

SEAug 21, 2012Code
A Quantitative Study of Social Organisation in Open Source Software Communities

Marcelo Serrano Zanetti, Emre Sarigol, Ingo Scholtes et al.

The success of open source projects crucially depends on the voluntary contributions of a sufficiently large community of users. Apart from the mere size of the community, interesting questions arise when looking at the evolution of structural features of collaborations between community members. In this article, we discuss several network analytic proxies that can be used to quantify different aspects of the social organisation in social collaboration networks. We particularly focus on measures that can be related to the cohesiveness of the communities, the distribution of responsibilities and the resilience against turnover of community members. We present a comparative analysis on a large-scale dataset that covers the full history of collaborations between users of 14 major open source software communities. Our analysis covers both aggregate and time-evolving measures and highlights differences in the social organisation across communities. We argue that our results are a promising step towards the definition of suitable, potentially multi-dimensional, resilience and risk indicators for open source software communities.

SEJan 18, 2012Code
A Network Perspective on Software Modularity

Marcelo Serrano Zanetti, Frank Schweitzer

Modularity is a desirable characteristic for software systems. In this article we propose to use a quantitative method from complex network sciences to estimate the coherence between the modularity of the dependency network of large open source Java projects and their decomposition in terms of Java packages. The results presented in this article indicate that our methodology offers a promising and reasonable quantitative approach with potential impact on software engineering processes.

CYMay 18, 2025
How Malicious AI Swarms Can Threaten Democracy: The Fusion of Agentic AI and LLMs Marks a New Frontier in Information Warfare

Daniel Thilo Schroeder, Meeyoung Cha, Andrea Baronchelli et al.

Public opinion manipulation has entered a new phase, amplifying its roots in rhetoric and propaganda. Advances in large language models (LLMs) and autonomous agents now let influence campaigns reach unprecedented scale and precision. Researchers warn AI could foster mass manipulation. Generative tools can expand propaganda output without sacrificing credibility and inexpensively create election falsehoods that are rated as more human-like than those written by humans. Techniques meant to refine AI reasoning, such as chain-of-thought prompting, can just as effectively be used to generate more convincing falsehoods. Enabled by these capabilities, another disruptive threat is emerging: swarms of collaborative, malicious AI agents. Fusing LLM reasoning with multi-agent architectures, these systems are capable of coordinating autonomously, infiltrating communities, and fabricating consensus cheaply. By adaptively mimicking human social dynamics, they threaten democracy.

SEJan 12, 2022
Big Data = Big Insights? Operationalising Brooks' Law in a Massive GitHub Data Set

Christoph Gote, Pavlin Mavrodiev, Frank Schweitzer et al.

Massive data from software repositories and collaboration tools are widely used to study social aspects in software development. One question that several recent works have addressed is how a software project's size and structure influence team productivity, a question famously considered in Brooks' law. Recent studies using massive repository data suggest that developers in larger teams tend to be less productive than smaller teams. Despite using similar methods and data, other studies argue for a positive linear or even super-linear relationship between team size and productivity, thus contesting the view of software economics that software projects are diseconomies of scale. In our work, we study challenges that can explain the disagreement between recent studies of developer productivity in massive repository data. We further provide, to the best of our knowledge, the largest, curated corpus of GitHub projects tailored to investigate the influence of team size and collaboration patterns on individual and collective productivity. Our work contributes to the ongoing discussion on the choice of productivity metrics in the operationalisation of hypotheses about determinants of successful software projects. It further highlights general pitfalls in big data analysis and shows that the use of bigger data sets does not automatically lead to more reliable insights.

LGJul 13, 2020
Predicting Sequences of Traversed Nodes in Graphs using Network Models with Multiple Higher Orders

Christoph Gote, Giona Casiraghi, Frank Schweitzer et al.

We propose a novel sequence prediction method for sequential data capturing node traversals in graphs. Our method builds on a statistical modelling framework that combines multiple higher-order network models into a single multi-order model. We develop a technique to fit such multi-order models in empirical sequential data and to select the optimal maximum order. Our framework facilitates both next-element and full sequence prediction given a sequence-prefix of any length. We evaluate our model based on six empirical data sets containing sequences from website navigation as well as public transport systems. The results show that our method out-performs state-of-the-art algorithms for next-element prediction. We further demonstrate the accuracy of our method during out-of-sample sequence prediction and validate that our method can scale to data sets with millions of sequences.

HCMay 12, 2016
The Dynamics of Emotions in Online Interaction

David Garcia, Arvid Kappas, Dennis Küster et al.

We study the changes in emotional states induced by reading and participating in online discussions, empirically testing a computational model of online emotional interaction. Using principles of dynamical systems, we quantify changes in valence and arousal through subjective reports, as recorded in three independent studies including 207 participants (110 female). In the context of online discussions, the dynamics of valence and arousal are composed of two forces: an internal relaxation towards baseline values independent of the emotional charge of the discussion, and a driving force of emotional states that depends on the content of the discussion. The dynamics of valence show the existence of positive and negative tendencies, while arousal increases when reading emotional content regardless of its polarity. The tendency of participants to take part in the discussion increases with positive arousal. When participating in an online discussion, the content of participants' expression depends on their valence, and their arousal significantly decreases afterwards as a regulation mechanism. We illustrate how these results allow the design of agent-based models to reproduce and analyze emotions in online communities. Our work empirically validates the microdynamics of a model of online collective emotions, bridging online data analysis with research in the laboratory.