3 Papers

SEMar 29
Advancing Evidence-Based Social Sustainability in Software Engineering: A Research Roadmap

Bimpe Ayoola, Anielle Andrade, Ronnie de Souza Santos et al.

Social sustainability in software development means creating and maintaining systems that promote pro-social values (e.g., human well-being, equity), both now and in the future. However, social sustainability lacks clear conceptual and methodological foundations, and often takes a back seat to speed and profit. This paper therefore reports a narrative review of existing definitions of social sustainability in software development and identifies key aspects of social sustainability including social equity, well-being, and community cohesion. Challenges around measuring and integrating social sustainability into practice are conceptually analyzed. The paper then proposes a comprehensive definition of social sustainability and outlines a roadmap for measuring and integrating social sustainability into software engineering processes.

CYMay 3
Principles and Guidelines for Randomized Controlled Trials in AI Evaluation

Christopher Kelly, Angelica Chowdhury, Alexandra Campili et al.

This work establishes a foundational framework for standardizing AI evaluation RCTs (sometimes called human uplift studies). Drawing on established experimental practices from disciplines with established RCT traditions, including software engineering, economics, clinical and health sciences, and psychology, we adopt the (Shadish et al., 2002) four-validity framework and extend it with a fifth principle on transparency, repeatability, and verification adapted from the Transparency and Openness Promotion (TOP) Guidelines (Center for Open Science, 2025). We operationalize all five principles into 33 guidelines adapted for AI evaluation RCT contexts, expressed as requirements with rationales, implementation instructions, and evidence bases. We position the principles and guidelines as serving three key roles for AI evaluation RCTs: a design tool for planning studies, an evaluation rubric for assessing existing work, and a blueprint for standard setting as the field converges on norms. Our framework extends prior work by centering evaluation on human performance rather than model output alone, formalizing causal inference through RCT methodology for AI contexts, integrating heterogeneity analysis and practical significance assessment, implementing a graded transparency and repeatability framework, and addressing AI-specific challenges including model versioning, human-AI interaction dynamics, contamination and spillover effects, and equitable impact assessment.

SEMar 12
Team Diversity Promotes Software Fairness: An Experiment on Fairness-Aware Requirements Prioritization

Cleyton Magalhes, Ronnie de Souza Santos, Bimpe Ayoola et al.

\textbf{Background:} Fairness and diversity are receiving growing attention in software engineering, particularly as AI and machine learning systems increasingly influence decision-making processes. While fairness is often examined at the algorithmic or data level, there is limited understanding of how it is addressed during the early stages of software development. Moreover, little is known about how team diversity affects fairness-related decisions in software projects. \textbf{Aims:} This study investigates how diversity in software teams influences fairness-aware behavior during requirements prioritization. \textbf{Method:} A controlled experiment was conducted with 27 pairs of software engineering students, including 13 LGBTQ diverse pairs and 14 non diverse pairs. Each pair prioritized user stories with varying fairness implications. Descriptive statistics were used to analyze attitudes and prioritization outcomes, and thematic analysis was applied to examine the reasoning behind participants' decisions. \textbf{Results:} Both groups demonstrated general alignment with fairness principles, prioritizing features that promoted equitable treatment and rejecting those that posed fairness risks. However, LGBTQ diverse pairs were more consistent in rejecting fairness risking stories and made fewer fairness related misprioritization errors. Their reasoning emphasized inclusion, non discrimination, and ethical responsibility, whereas non diverse pairs adopted a more pragmatic, goal oriented perspective. \textbf{Conclusions:} The findings indicate that fairness should be considered from the earliest stages of software development. Team diversity can enhance the identification and interpretation of fairness issues during requirements analysis, fostering more reflective and inclusive decision making.