Paul Ralph

SE
h-index11
18papers
1,124citations
Novelty28%
AI Score44

18 Papers

SEMay 10
Guidelines for Empirical Studies in Software Engineering involving Large Language Models

Sebastian Baltes, Florian Angermeir, Chetan Arora et al.

Large Language Models (LLMs) are widely used in software engineering (SE) research and practice, yet their non-determinism, opaque training data, and rapidly evolving models threaten the reproducibility and replicability of empirical studies. We address this challenge through a collaborative effort of 22 researchers, presenting a taxonomy of seven study types that organizes how LLMs are used in SE research, together with eight guidelines for designing and reporting such studies. Each guideline distinguishes requirements (must) from recommended practices (should) and is contextualized by the study types it applies to. Our guidelines recommend that researchers: (1) declare LLM usage and role; (2) report model versions, configurations, and customizations; (3) document the tool architecture beyond the model; (4) disclose prompts, their development, and interaction logs; (5) validate LLM outputs with humans; (6) include an open LLM as a baseline; (7) use suitable baselines, benchmarks, and metrics; and (8) articulate limitations and mitigations. We complement the guidelines with an applicability matrix mapping guidelines to study types and a reporting checklist for authors and reviewers. We maintain the study types and guidelines online as a living resource for the community to use and shape (llm-guidelines$.$org).

SEMar 16
Making Software Metrics Useful

Ewan Tempero, Paul Ralph

Most engineers use measurements to make decisions. However, measurements are rarely used for decisions about constructing software products. While many approaches to measuring attributes of software (``metrics'') have been developed, they are rarely used to answer useful questions such as ``Do I need to refactor this class?'' or ``Are these integration tests sufficient?'' Practitioners therefore question the value of software metrics. We argue that this situation arose because software metrics were developed without understanding metrology (the science of measurement) and suggest directions software metrics research should take.

SEMar 29
Advancing Evidence-Based Social Sustainability in Software Engineering: A Research Roadmap

Bimpe Ayoola, Anielle Andrade, Ronnie de Souza Santos et al.

Social sustainability in software development means creating and maintaining systems that promote pro-social values (e.g., human well-being, equity), both now and in the future. However, social sustainability lacks clear conceptual and methodological foundations, and often takes a back seat to speed and profit. This paper therefore reports a narrative review of existing definitions of social sustainability in software development and identifies key aspects of social sustainability including social equity, well-being, and community cohesion. Challenges around measuring and integrating social sustainability into practice are conceptually analyzed. The paper then proposes a comprehensive definition of social sustainability and outlines a roadmap for measuring and integrating social sustainability into software engineering processes.

CLApr 24, 2024
The Promise and Challenges of Using LLMs to Accelerate the Screening Process of Systematic Reviews

Aleksi Huotala, Miikka Kuutila, Paul Ralph et al.

Systematic review (SR) is a popular research method in software engineering (SE). However, conducting an SR takes an average of 67 weeks. Thus, automating any step of the SR process could reduce the effort associated with SRs. Our objective is to investigate if Large Language Models (LLMs) can accelerate title-abstract screening by simplifying abstracts for human screeners, and automating title-abstract screening. We performed an experiment where humans screened titles and abstracts for 20 papers with both original and simplified abstracts from a prior SR. The experiment with human screeners was reproduced with GPT-3.5 and GPT-4 LLMs to perform the same screening tasks. We also studied if different prompting techniques (Zero-shot (ZS), One-shot (OS), Few-shot (FS), and Few-shot with Chain-of-Thought (FS-CoT)) improve the screening performance of LLMs. Lastly, we studied if redesigning the prompt used in the LLM reproduction of screening leads to improved performance. Text simplification did not increase the screeners' screening performance, but reduced the time used in screening. Screeners' scientific literacy skills and researcher status predict screening performance. Some LLM and prompt combinations perform as well as human screeners in the screening tasks. Our results indicate that the GPT-4 LLM is better than its predecessor, GPT-3.5. Additionally, Few-shot and One-shot prompting outperforms Zero-shot prompting. Using LLMs for text simplification in the screening process does not significantly improve human performance. Using LLMs to automate title-abstract screening seems promising, but current LLMs are not significantly more accurate than human screeners. To recommend the use of LLMs in the screening process of SRs, more research is needed. We recommend future SR studies publish replication packages with screening data to enable more conclusive experimenting with LLM screening.

SEFeb 21, 2022
A Grounded Theory of Coordination in Remote-First and Hybrid Software Teams

Ronnie E. de Souza Santos, Paul Ralph

While the long-term effects of the COVID-19 pandemic on software professionals and organizations are difficult to predict, it seems likely that working from home, remote-first teams, distributed teams, and hybrid (part-remote/part-office) teams will be more common. It is therefore important to investigate the challenges that software teams and organizations face with new remote and hybrid work. Consequently, this paper reports a year-long, participant-observation, constructivist grounded theory study investigating the impact of working from home on software development. This study resulted in a theory of software team coordination. Briefly, shifting from in-office to at-home work fundamentally altered coordination within software teams. While group cohesion and more effective communication appear protective, coordination is undermined by distrust, parenting and communication bricolage. Poor coordination leads to numerous problems including misunderstandings, help requests, lower job satisfaction among team members, and more ill-defined tasks. These problems, in turn, reduce overall project success and prompt professionals to alter their software development processes (in this case, from Scrum to Kanban). Our findings suggest that software organizations with many remote employees can improve performance by encouraging greater engagement within teams and supporting employees with family and childcare responsibilities.

SEFeb 15, 2022
Social Science Theories in Software Engineering Research

Tobias Lorey, Paul Ralph, Michael Felderer

As software engineering research becomes more concerned with the psychological, sociological and managerial aspects of software development, relevant theories from reference disciplines are increasingly important for understanding the field's core phenomena of interest. However, the degree to which software engineering research draws on relevant social sciences remains unclear. This study therefore investigates the use of social science theories in five influential software engineering journals over 13 years. It analyzes not only the extent of theory use but also what, how and where these theories are used. While 87 different theories are used, less than two percent of papers use a social science theory, most theories are used in only one paper, most social sciences are ignored, and the theories are rarely tested for applicability to software engineering contexts. Ignoring relevant social science theories may (1) undermine the community's ability to generate, elaborate and maintain a cumulative body of knowledge; and (2) lead to oversimplified models of software engineering phenomena. More attention to theory is needed for software engineering to mature as a scientific discipline.

SEJan 20, 2022
What Makes Effective Leadership in Agile Software Development Teams?

Lucas Gren, Paul Ralph

Effective leadership is one of the key drivers of business and project success, and one of the most active areas of management research. But how does leadership work in agile software development, which emphasizes self-management and self-organization and marginalizes traditional leadership roles? To find out, this study examines agile leadership from the perspective of thirteen professionals who identify as agile leaders, in different roles, at ten different software development companies of varying sizes. Data from semi-structured interviews reveals that leadership: (1) is dynamically shared among team members; (2) engenders a sense of belonging to the team; and (3) involves balancing competing organizational cultures (e.g. balancing the new agile culture with the old milestone-driven culture). In other words, agile leadership is a property of a team, not a role, and effectiveness depends on agile team members' identifying with the team, accepting responsibility, and being sensitive to cultural conflict.

SEOct 7, 2020
Empirical Standards for Software Engineering Research

Paul Ralph, Nauman bin Ali, Sebastian Baltes et al.

Empirical Standards are natural-language models of a scientific community's expectations for a specific kind of study (e.g. a questionnaire survey). The ACM SIGSOFT Paper and Peer Review Quality Initiative generated empirical standards for research methods commonly used in software engineering. These living documents, which should be continuously revised to reflect evolving consensus around research best practices, will improve research quality and make peer review more effective, reliable, transparent and fair.

SEMay 3, 2020
Pandemic Programming: How COVID-19 affects software developers and how their organizations can help

Paul Ralph, Sebastian Baltes, Gianisa Adisaputri et al.

Context. As a novel coronavirus swept the world in early 2020, thousands of software developers began working from home. Many did so on short notice, under difficult and stressful conditions. Objective. This study investigates the effects of the pandemic on developers' wellbeing and productivity. Method. A questionnaire survey was created mainly from existing, validated scales and translated into 12 languages. The data was analyzed using non-parametric inferential statistics and structural equation modeling. Results. The questionnaire received 2225 usable responses from 53 countries. Factor analysis supported the validity of the scales and the structural model achieved a good fit (CFI = 0.961, RMSEA = 0.051, SRMR = 0.067). Confirmatory results include: (1) the pandemic has had a negative effect on developers' wellbeing and productivity; (2) productivity and wellbeing are closely related; (3) disaster preparedness, fear related to the pandemic and home office ergonomics all affect wellbeing or productivity. Exploratory analysis suggests that: (1) women, parents and people with disabilities may be disproportionately affected; (2) different people need different kinds of support. Conclusions. To improve employee productivity, software companies should focus on maximizing employee wellbeing and improving the ergonomics of employees' home offices. Women, parents and disabled persons may require extra support.

SEFeb 18, 2020
Sampling in Software Engineering Research: A Critical Review and Guidelines

Sebastian Baltes, Paul Ralph

Representative sampling appears rare in empirical software engineering research. Not all studies need representative samples, but a general lack of representative sampling undermines a scientific field. This article therefore reports a critical review of the state of sampling in recent, high-quality software engineering research. The key findings are: (1) random sampling is rare; (2) sophisticated sampling strategies are very rare; (3) sampling, representativeness and randomness often appear misunderstood. These findings suggest that software engineering research has a generalizability crisis. To address these problems, this paper synthesizes existing knowledge of sampling into a succinct primer and proposes extensive guidelines for improving the conduct, presentation and evaluation of sampling in software engineering research. It is further recommended that while researchers should strive for more representative samples, disparaging non-probability sampling is generally capricious and particularly misguided for predominately qualitative research.

SEFeb 28, 2019
Requirements Framing Affects Design Creativity

Rahul Mohanani, Burak Turhan, Paul Ralph

Design creativity, the originality and practicality of a solution concept is critical for the success of many software projects. However, little research has investigated the relationship between the way desiderata are presented and design creativity. This study therefore investigates the impact of presenting desiderata as ideas, requirements or prioritized requirements on design creativity. Two between-subjects randomized controlled experiments were conducted with 42 and 34 participants. Participants were asked to create design concepts from a list of desiderata. Participants who received desiderata framed as requirements or prioritized requirements created designs that are, on average, less original but more practical than the designs created by participants who received desiderata framed as ideas. This suggests that more formal, structured presentations of desiderata are less appropriate where a creative solution is desired. The results also show that design performance is highly susceptible to minor changes in the vernacular used to communicate desiderata.

SEFeb 18, 2018
The Dangerous Dogmas of Software Engineering

Paul Ralph, Briony J. Oates

To legitimize itself as a scientific discipline, the software engineering academic community must let go of its non-empirical dogmas. A dogma is belief held regardless of evidence. This paper analyzes the nature and detrimental effects of four software engineering dogmas - 1) the belief that software has "requirements"; 2) the division of software engineering tasks into analysis, design, coding and testing; 3) the belief that software engineering is predominantly concerned with designing "software" systems; 4) the belief that software engineering follows methods effectively. Deconstructing these dogmas reveals that they each oversimplify and over-rationalize aspects of software engineering practice, which obscures underlying phenomena and misleads researchers and practitioners. Evidenced-based practice is analyzed as a means to expose and repudiate non-empirical dogmas. This analysis results in several novel recommendations for overcoming the practical challenges of evidence-based practice.

SEFeb 18, 2018
Consensus in Software Engineering: A Cognitive Mapping Study

Pontus Johnson, Paul Ralph, Mathias Ekstedt et al.

Background: Philosophers of science including Collins, Feyerabend, Kuhn and Latour have all emphasized the importance of consensus within scientific communities of practice. Consensus is important for maintaining legitimacy with outsiders, orchestrating future research, developing educational curricula and agreeing industry standards. Low consensus contrastingly undermines a field's reputation and hinders peer review. Aim: This paper aims to investigate the degree of consensus within the software engineering academic community concerning members' implicit theories of software engineering. Method: A convenience sample of 60 software engineering researchers produced diagrams describing their personal understanding of causal relationships between core software engineering constructs. The diagrams were then analyzed for patterns and clusters. Results: At least three schools of thought may be forming; however, their interpretation is unclear since they do not correspond to known divisions within the community (e.g. Agile vs. Plan-Driven methods). Furthermore, over one third of participants do not belong to any cluster. Conclusion: Although low consensus is common in social sciences, the rapid pace of innovation observed in software engineering suggests that high consensus is achievable given renewed commitment to empiricism and evidence-based practice.

SEJul 12, 2017
Cognitive Biases in Software Engineering: A Systematic Mapping Study

Rahul Mohanani, Iflaah Salman, Burak Turhan et al.

One source of software project challenges and failures is the systematic errors introduced by human cognitive biases. Although extensively explored in cognitive psychology, investigations concerning cognitive biases have only recently gained popularity in software engineering (SE) research. This paper therefore systematically maps, aggregates and synthesizes the literature on cognitive biases in software engineering to generate a comprehensive body of knowledge, understand state of the art research and provide guidelines for future research and practise. Focusing on bias antecedents, effects and mitigation techniques, we identified 65 articles, which investigate 37 cognitive biases, published between 1990 and 2016. Despite strong and increasing interest, the results reveal a scarcity of research on mitigation techniques and poor theoretical foundations in understanding and interpreting cognitive biases. Although bias-related research has generated many new insights in the software engineering community, specific bias mitigation techniques are still needed for software professionals to overcome the deleterious effects of cognitive biases on their work.

SEJul 3, 2013
Software Engineering Process Theory: A Multi-Method Comparison of Sensemaking-CoevoIution-Implementation Theory and Function-Behavior-Structure Theory

Paul Ralph

Many academics have called for increasing attention to theory in software engineering. Consequently, this paper empirically evaluates two dissimilar software development process theories - one expressing a more traditional, methodical view (FBS) and one expressing an alternative, more improvisational view (SCI). A primarily quantitative survey of more than 1300 software developers is combined with four qualitative case studies to achieve a simultaneously broad and deep empirical evaluation. Case data analysis using a closed-ended, a priori coding scheme based on the two theories strongly supports SCI, as does analysis of questionnaire response distributions (p<0.001; chi-square goodness of fit test). Furthermore, case-questionnaire triangulation found no evidence that support for SCI varied by participants' gender, education, experience, nationality or the size or nature of their projects. This suggests that instead of iteration between weakly-coupled phases (analysis, design, coding, testing), it is more accurate and useful to conceptualize development as ad hoc oscillation between organizing perceptions of the project context (Sensemaking), simultaneously improving mental pictures of the context and design artifact (Coevolution) and constructing, debugging and deploying software artifacts (Implementation).

SEMar 30, 2013
The Illusion of Requirements in Software Development

Paul Ralph

It is widely accepted that understanding system requirements is important for software development project success. However, this paper presents two novel challenges to the requirements concept. First, where many plausible approaches to achieving a goal are evident, there may be insufficient overlap between approaches to form requirements. Second, while all plausible approaches may have sufficient overlap to state requirements, we cannot know that unless all approaches are identified and we are sure that none have been missed. This suggest that many, if not most, software projects may have too few requirements drive the design process, and that analysts may misrepresent design decisions as requirements to compensate.

SEMar 24, 2013
The Two Paradigms of Software Design

Paul Ralph

The dominant view of design in information systems and software engineering, the Rational Design Paradigm, views software development as a methodical, plan-centered, approximately rational process of optimizing a design candidate for known constraints and objectives. This paper synthesizes an Alternative Design Paradigm, which views software development as an amethodical, improvisational, emotional process of simultaneously framing the problem and building artifacts to address it. These conflicting paradigms are manifestations of a deeper philosophical conflict between rationalism and empiricism. The paper clarifies the nature, components and assumptions of each paradigm and explores the implications of the paradigmatic conflict for research, practice and education.

SEFeb 17, 2013
The Sensemaking-Coevolution-Implementation Theory of Software Design

Paul Ralph

Understanding software design practice is critical to understanding modern information systems development. New developments in empirical software engineering, information systems design science and the interdisciplinary design literature combined with recent advances in process theory and testability have created a situation ripe for innovation. Consequently, this paper utilizes these breakthroughs to formulate a process theory of software design practice: Sensemaking-Coevolution-Implementation Theory explains how complex software systems are created by collocated software development teams in organizations. It posits that an independent agent (design team) creates a software system by alternating between three activities: organizing their perceptions about the context, mutually refining their understandings of the context and design space, and manifesting their understanding of the design space in a technological artifact. This theory development paper defines and illustrates Sensemaking-Coevolution-Implementation Theory, grounds its concepts and relationships in existing literature, conceptually evaluates the theory and situates it in the broader context of information systems development.