Ann Barcomb

SE
h-index9
7papers
22citations
Novelty26%
AI Score39

7 Papers

SEAug 18, 2020Code
A Replication Study on Measuring the Growth of Open Source

Michael Dorner, Maximilian Capraro, Ann Barcomb et al.

Context: Over the last decades, open-source software has pervaded the software industry and has become one of the key pillars in software engineering. The incomparable growth of open source reflected that pervasion: Prior work described open source as a whole to be growing linearly, polynomially, or even exponentially. Objective: In this study, we explore the long-term growth of open source and corroborating previous findings by replicating previous studies on measuring the growth of open source projects. Method: We replicate four existing measurements on the growth of open source on a sample of 172,833 open-source projects using Open Hub as the measurement system: We analyzed lines of code, commits, new projects, and the number of open-source contributors over the last 30 years in the known open-source universe. Results: We found growth of open source to be exhausted: After an initial exponential growth, all measurements show a monotonic downwards trend since its peak in 2013. None of the existing growth models could stand the test of time. Conclusion: Our results raise more questions on the growth of open source and the representativeness of Open Hub as a proxy for describing open source. We discuss multiple interpretations for our observations and encourage further research using alternative data sets.

11.0CLApr 22
Intersectional Fairness in Large Language Models

Chaima Boufaied, Ronnie De Souza Santos, Ann Barcomb

Large Language Models (LLMs) are increasingly deployed in socially sensitive settings, raising concerns about fairness and biases, particularly across intersectional demographic attributes. In this paper, we systematically evaluate intersectional fairness in six LLMs using ambiguous and disambiguated contexts from two benchmark datasets. We assess LLM behavior using bias scores, subgroup fairness metrics, accuracy, and consistency through multi-run analysis across contexts and negative and non-negative question polarities. Our results show that while modern LLMs generally perform well in ambiguous contexts, this limits the informativeness of fairness metrics due to sparse non-unknown predictions. In disambiguated contexts, LLM accuracy is influenced by stereotype alignment, with models being more accurate when the correct answer reinforces a stereotype than when it contradicts it. This pattern is especially pronounced in race-gender intersections, where directional bias toward stereotypes is stronger. Subgroup fairness metrics further indicate that, despite low observed disparity in some cases, outcome distributions remain uneven across intersectional groups. Across repeated runs, responses also vary in consistency, including stereotype-aligned responses. Overall, our findings show that apparent model competence is partly associated with stereotype-consistent cues, and no evaluated LLM achieves consistently reliable or fair behavior across intersectional settings. These findings highlight the need for evaluation beyond accuracy, emphasizing the importance of combining bias, subgroup fairness, and consistency metrics across intersectional groups, contexts, and repeated runs.

30.0CYApr 6
Teaching Empathy in Software Engineering Education in the Age of Artificial Intelligence

Ronnie de Souza Santos, Cleyton Magalhães, Giuseppe Destefanis et al.

Empathy has been discussed as a relevant human capability in software engineering, particularly in activities that require understanding users, stakeholders, and the societal implications of technological systems. This relevance becomes more pronounced in the context of artificial intelligence, where software increasingly participates in decisions that affect diverse individuals and communities. However, limited guidance exists on how empathy can be integrated into technical software engineering education in ways that connect with the development of AI-enabled systems. This study investigates teaching practices that educators use to incorporate empathy into software engineering courses. Using qualitative analysis of educator-reported practices, we identified five categories through which empathy is operationalized within technical coursework: societal framing of AI systems, fairness and accessibility considerations in design and evaluation, representation of diverse users, stakeholder role awareness and responsibility, and structured reflection and feedback during development processes. The findings indicate that empathy can be embedded within core development activities rather than taught as a separate topic, enabling students to reason about bias, accessibility, accountability, and the societal consequences of AI technologies. These results contribute a structured view of how empathy-oriented practices can be incorporated into software engineering education to support the preparation of students who will develop AI-enabled systems.

55.7SEMar 12
How Fair is Software Fairness Testing?

Ann Barcomb, Mariana Pinheiro Bento, Giuseppe Destefanis et al.

Software fairness testing is a central method for evaluating AI systems, yet the meaning of fairness is often treated as fixed and universally applicable. This vision paper positions fairness testing as culturally situated and examines the problem across three dimensions. First, fairness metrics encode particular cultural values while marginalizing others. Second, test datasets are predominantly designed from Western contexts, excluding knowledge systems grounded in oral traditions, Indigenous languages, and non-digital communities. Third, fairness testing raises ethical concerns, including the reliance on low-paid data labeling in the Global South, and associated with this, the environmental costs of training and deploying large-scale models, which disproportionately affect climate-vulnerable populations. Addressing these issues requires rethinking fairness testing beyond universal metrics and moving toward evaluation frameworks that respect cultural plurality and acknowledge the right to refuse algorithmic mediation.

SEApr 27, 2025
From Inductive to Deductive: LLMs-Based Qualitative Data Analysis in Requirements Engineering

Syed Tauhid Ullah Shah, Mohamad Hussein, Ann Barcomb et al.

Requirements Engineering (RE) is essential for developing complex and regulated software projects. Given the challenges in transforming stakeholder inputs into consistent software designs, Qualitative Data Analysis (QDA) provides a systematic approach to handling free-form data. However, traditional QDA methods are time-consuming and heavily reliant on manual effort. In this paper, we explore the use of Large Language Models (LLMs), including GPT-4, Mistral, and LLaMA-2, to improve QDA tasks in RE. Our study evaluates LLMs' performance in inductive (zero-shot) and deductive (one-shot, few-shot) annotation tasks, revealing that GPT-4 achieves substantial agreement with human analysts in deductive settings, with Cohen's Kappa scores exceeding 0.7, while zero-shot performance remains limited. Detailed, context-rich prompts significantly improve annotation accuracy and consistency, particularly in deductive scenarios, and GPT-4 demonstrates high reliability across repeated runs. These findings highlight the potential of LLMs to support QDA in RE by reducing manual effort while maintaining annotation quality. The structured labels automatically provide traceability of requirements and can be directly utilized as classes in domain models, facilitating systematic software design.

SEApr 18, 2025
A Survey for What Developers Require in AI-powered Tools that Aid in Component Selection in CBSD

Mahdi Jaberzadeh Ansari, Ann Barcomb

Although it has been more than four decades that the first components-based software development (CBSD) studies were conducted, there is still no standard method or tool for component selection which is widely accepted by the industry. The gulf between industry and academia contributes to the lack of an accepted tool. We conducted a mixed methods survey of nearly 100 people engaged in component-based software engineering practice or research to better understand the problems facing industry, how these needs could be addressed, and current best practices employed in component selection. We also sought to identify and prioritize quality criteria for component selection from an industry perspective. In response to the call for CBSD component selection tools to incorporate recent technical advances, we also explored the perceptions of professionals about AI-driven tools, present and envisioned.

AIJul 11, 2021
Pattern Discovery and Validation Using Scientific Research Methods

Dirk Riehle, Nikolay Harutyunyan, Ann Barcomb

Pattern discovery, the process of discovering previously unrecognized patterns, is often performed as an ad-hoc process with little resulting certainty in the quality of the proposed patterns. Pattern validation, the process of validating the accuracy of proposed patterns, remains dominated by the simple heuristic of "the rule of three". This article shows how to use established scientific research methods for the purpose of pattern discovery and validation. We present a specific approach, called the handbook method, that uses the qualitative survey, action research, and case study research for pattern discovery and evaluation, and we discuss the underlying principle of using scientific methods in general. We evaluate the handbook method using three exploratory studies and demonstrate its usefulness.