SEApr 28
Does social identity matter in software engineering? Assessing the case of research software engineersChukwudi Uwasomba, Tamara Lopez, Melanie Langer et al.
Social identity is a concept from psychology that refers to the part of an individual's identity that derives from their group membership(s). In this paper, we explore social identity in members of the professional community of Research Software Engineers (RSEs). Using a mixed-methods approach, our study combined computational linguistic analysis and inferential statistics to examine over 28,000 social media posts, 1,700 blogs, and survey responses from 381 professional RSEs. The findings highlight the emergence of a collective RSE identity and demonstrate its role in shaping professional wellbeing. This study contributes an interdisciplinary perspective by integrating social psychology and software engineering to show how a professional identity evolves and why it matters.
HCOct 1, 2025
From keywords to semantics: Perceptions of large language models in data discoveryMaura E Halstead, Mark A. Green, Caroline Jay et al.
Current approaches to data discovery match keywords between metadata and queries. This matching requires researchers to know the exact wording that other researchers previously used, creating a challenging process that could lead to missing relevant data. Large Language Models (LLMs) could enhance data discovery by removing this requirement and allowing researchers to ask questions with natural language. However, we do not currently know if researchers would accept LLMs for data discovery. Using a human-centered artificial intelligence (HCAI) focus, we ran focus groups (N = 27) to understand researchers' perspectives towards LLMs for data discovery. Our conceptual model shows that the potential benefits are not enough for researchers to use LLMs instead of current technology. Barriers prevent researchers from fully accepting LLMs, but features around transparency could overcome them. Using our model will allow developers to incorporate features that result in an increased acceptance of LLMs for data discovery.
CYOct 22, 2024
Contrasting Attitudes Towards Current and Future AI Applications for Computerised Interpretation of ECG: A Clinical Stakeholder Interview StudyLukas Hughes-Noehrer, Leda Channer, Gabriel Strain et al.
Objectives: To investigate clinicians' attitudes towards current automated interpretation of ECG and novel AI technologies and their perception of computer-assisted interpretation. Materials and Methods: We conducted a series of interviews with clinicians in the UK. Our study: (i) explores the potential for AI, specifically future 'human-like' computing approaches, to facilitate ECG interpretation and support clinical decision making, and (ii) elicits their opinions about the importance of explainability and trustworthiness of AI algorithms. Results: We performed inductive thematic analysis on interview transcriptions from 23 clinicians and identified the following themes: (i) a lack of trust in current systems, (ii) positive attitudes towards future AI applications and requirements for these, (iii) the relationship between the accuracy and explainability of algorithms, and (iv) opinions on education, possible deskilling, and the impact of AI on clinical competencies. Discussion: Clinicians do not trust current computerised methods, but welcome future 'AI' technologies. Where clinicians trust future AI interpretation to be accurate, they are less concerned that it is explainable. They also preferred ECG interpretation that demonstrated the results of the algorithm visually. Whilst clinicians do not fear job losses, they are concerned about deskilling and the need to educate the workforce to use AI responsibly. Conclusion: Clinicians are positive about the future application of AI in clinical decision-making. Accuracy is a key factor of uptake and visualisations are preferred over current computerised methods. This is viewed as a potential means of training and upskilling, in contrast to the deskilling that automation might be perceived to bring.
HCAug 4, 2021
Using Interaction Data to Predict Engagement with Interactive MediaJonathan Carlton, Andy Brown, Caroline Jay et al.
Media is evolving from traditional linear narratives to personalised experiences, where control over information (or how it is presented) is given to individual audience members. Measuring and understanding audience engagement with this media is important in at least two ways: (1) a post-hoc understanding of how engaged audiences are with the content will help production teams learn from experience and improve future productions; (2), this type of media has potential for real-time measures of engagement to be used to enhance the user experience by adapting content on-the-fly. Engagement is typically measured by asking samples of users to self-report, which is time consuming and expensive. In some domains, however, interaction data have been used to infer engagement. Fortuitously, the nature of interactive media facilitates a much richer set of interaction data than traditional media; our research aims to understand if these data can be used to infer audience engagement. In this paper, we report a study using data captured from audience interactions with an interactive TV show to model and predict engagement. We find that temporal metrics, including overall time spent on the experience and the interval between events, are predictive of engagement. The results demonstrate that interaction data can be used to infer users' engagement during and after an experience, and the proposed techniques are relevant to better understand audience preference and responses.
DLApr 30, 2021
Number and quality of diagrams in scholarly publications is associated with number of citationsGuy Clarke Marshall, Caroline Jay, Andre Freitas
Diagrams are often used in scholarly communication. We analyse a corpus of diagrams found in scholarly computational linguistics conference proceedings (ACL 2017), and find inclusion of a system diagram to be correlated with higher numbers of citations after 3 years. Inclusion of over three diagrams in this 8-page limit conference was found to correlate with a lower citation count. Focusing on neural network system diagrams, we find a correlation between highly cited papers and "good diagramming practice" quantified by level of compliance with a set of diagramming guidelines. Two diagram classification types (one visually based, one mental model based) were not found to correlate with number of citations, but enabled quantification of heterogeneity in those dimensions. Exploring scholarly paper-writing guides, we find diagrams to be a neglected media. This study suggests that diagrams may be a useful source of quality data for predicting citations, and that "graphicacy" is a key skill for scholars with insufficient support at present.
HCApr 30, 2021
Why scholars are diagramming neural network modelsGuy Clarke Marshall, Caroline Jay, Andre Freitas
Complex models, such as neural networks (NNs), are comprised of many interrelated components. In order to represent these models, eliciting and characterising the relations between components is essential. Perhaps because of this, diagrams, as "icons of relation", are a prevalent medium for signifying complex models. Diagrams used to communicate NN architectures are currently extremely varied. The diversity in diagrammatic choices provides an opportunity to gain insight into the aspects which are being prioritised for communication. In this philosophical exploration of NN diagrams, we integrate theories of conceptual models, communication theory, and semiotics.
HCApr 30, 2021
Structuralist analysis for neural network system diagramsGuy Clarke Marshall, Caroline Jay, Andre Freitas
This short paper examines diagrams describing neural network systems in academic conference proceedings. Many aspects of scholarly communication are controlled, particularly with relation to text and formatting, but often diagrams are not centrally curated beyond a peer review. Using a corpus-based approach, we argue that the heterogeneous diagrammatic notations used for neural network systems has implications for signification in this domain. We divide this into (i) what content is being represented and (ii) how relations are encoded. Using a novel structuralist framework, we use a corpus analysis to quantitatively cluster diagrams according to the author's representational choices. This quantitative diagram classification in a heterogeneous domain may provide a foundation for further analysis.
SEApr 4, 2021
Understanding Equity, Diversity and Inclusion Challenges Within the Research Software CommunityNeil P. Chue Hong, Jeremy Cohen, Caroline Jay
Research software -- specialist software used to support or undertake research -- is of huge importance to researchers. It contributes to significant advances in the wider world and requires collaboration between people with diverse skills and backgrounds. Analysis of recent survey data provides evidence for a lack of diversity in the Research Software Engineer community. We identify interventions which could address challenges in the wider research software community and highlight areas where the community is becoming more diverse. There are also lessons that are applicable, more generally, to the field of software development around recruitment from other disciplines and the importance of welcoming communities.
HCAug 28, 2020
A Framework for Improving Scholarly Neural Network DiagramsGuy Clarke Marshall, André Freitas, Caroline Jay
Neural networks are a prevalent and effective machine learning component, and their application is leading to significant scientific progress in many domains. As the field of neural network systems is fast growing, it is important to understand how advances are communicated. Diagrams are key to this, appearing in almost all papers describing novel systems. This paper reports on a study into the use of neural network system diagrams, through interviews, card sorting, and qualitative feedback structured around ecologically-derived examples. We find high diversity of usage, perception and preference in both creation and interpretation of diagrams, examining this in the context of existing design, information visualisation, and user experience guidelines. This interview study is used to derive a framework for improving existing diagrams. This framework is evaluated through a mixed-methods experimental study, and a ``corpus-based'' approach examining properties of published diagrams linking the framework to citations. The studies suggest that the framework captures aspects relating to communicative efficacy of scholarly NN diagrams, and provides simple steps for their implementation.
HCAug 26, 2020
Understanding scholarly Natural Language Processing system diagrams through application of the Richards-Engelhardt frameworkGuy Clarke Marshall, Caroline Jay, André Freitas
We utilise Richards-Engelhardt framework as a tool for understanding Natural Language Processing systems diagrams. Through four examples from scholarly proceedings, we find that the application of the framework to this ecological and complex domain is effective for reflecting on these diagrams. We argue for vocabulary to describe multiple-codings, semiotic variability, and inconsistency or misuse of visual encoding principles in diagrams. Further, for application to scholarly Natural Language Processing systems, and perhaps systems diagrams more broadly, we propose the addition of "Grouping by Object" as a new visual encoding principle, and "Emphasising" as a new visual encoding type.
SEOct 22, 2019
Theory-Software Translation: Research Challenges and Future DirectionsCaroline Jay, Robert Haines, Daniel S. Katz et al.
The Theory-Software Translation Workshop, held in New Orleans in February 2019, explored in depth the process of both instantiating theory in software - for example, implementing a mathematical model in code as part of a simulation - and using the outputs of software - such as the behavior of a simulation - to advance knowledge. As computation within research is now ubiquitous, the workshop provided a timely opportunity to reflect on the particular challenges of research software engineering - the process of developing and maintaining software for scientific discovery. In addition to the general challenges common to all software development projects, research software additionally must represent, manipulate, and provide data for complex theoretical constructs. Ensuring this process is robust is essential to maintaining the integrity of the science resulting from it, and the workshop highlighted a number of areas where the current approach to research software engineering would benefit from an evidence base that could be used to inform best practice. The workshop brought together expert research software engineers and academics to discuss the challenges of Theory-Software Translation over a two-day period. This report provides an overview of the workshop activities, and a synthesises of the discussion that was recorded. The body of the report presents a thematic analysis of the challenges of Theory-Software Translation as identified by workshop participants, summarises these into a set of research areas, and provides recommendations for the future direction of this work.
SEMar 15, 2019
A Methodology for Using GitLab for Software Engineering Learning AnalyticsJulio César Cortés Ríos, Kamilla Kopec-Harding, Sukru Eraslan et al.
To bridge the digital skills gap, we need to train more people in Software Engineering techniques. This paper reports on a project exploring the way students solve tasks using collaborative development platforms and version control systems, such as GitLab, to find patterns and evaluation metrics that can be used to improve the course content and reflect on the most common issues the students are facing. In this paper, we explore Learning Analytics approaches that can be used with GitLab and similar tools, and discuss the challenges raised when applying those approaches in Software Engineering Education, with the objective of building a pipeline that supports the full Learning Analytics cycle, from data extraction to data analysis. We focus in particular on the data anonymisation step of the proposed pipeline to explore the available alternatives to satisfy the data protection requirements when handling personal information in academic environments for research purposes.
SEMar 14, 2019
What Makes Research Software Sustainable? An Interview Study With Research Software EngineersMario Rosado de Souza, Robert Haines, Markel Vigo et al.
Software is now a vital scientific instrument, providing the tools for data collection and analysis across disciplines from bioinformatics and computational physics, to the humanities. The software used in research is often home-grown and bespoke: it is constructed for a particular project, and rarely maintained beyond this, leading to rapid decay, and frequent `reinvention of the wheel'. Understanding how to develop sustainable research software, such that it is suitable for future reuse, is therefore of interest to both researchers and funders, but how to achieve this remains an open question. Here we report the results of an interview study examining how research software engineers -- the people actively developing software in an academic research environment -- subjectively define software sustainability. Thematic analysis of the data reveals two interacting dimensions: \emph{intrinsic sustainability}, which relates to internal qualities of software, such as modularity, encapsulation and testability, and \emph{extrinsic sustainability}, concerning cultural and organisational factors, including how software is resourced, supported and shared. Research software engineers believe an increased focus on quality and discoverability are key factors in increasing the sustainability of academic research software.
SEJul 19, 2018
The State of Sustainable Research Software: Results from the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1)Daniel S. Katz, Stephan Druskat, Robert Haines et al.
This article summarizes motivations, organization, and activities of the Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE5.1) held in Manchester, UK in September 2017. The WSSSPE series promotes sustainable research software by positively impacting principles and best practices, careers, learning, and credit. This article discusses the Code of Conduct, idea papers, position papers, experience papers, demos, and lightning talks presented during the workshop. The main part of the article discusses the speed-blogging groups that formed during the meeting, along with the outputs of those sessions.