SEOct 3, 2023
Can Large Language Models Transform Natural Language Intent into Formal Method Postconditions?Madeline Endres, Sarah Fakhoury, Saikat Chakraborty et al.
Informal natural language that describes code functionality, such as code comments or function documentation, may contain substantial information about a programs intent. However, there is typically no guarantee that a programs implementation and natural language documentation are aligned. In the case of a conflict, leveraging information in code-adjacent natural language has the potential to enhance fault localization, debugging, and code trustworthiness. In practice, however, this information is often underutilized due to the inherent ambiguity of natural language which makes natural language intent challenging to check programmatically. The emergent abilities of Large Language Models (LLMs) have the potential to facilitate the translation of natural language intent to programmatically checkable assertions. However, it is unclear if LLMs can correctly translate informal natural language specifications into formal specifications that match programmer intent. Additionally, it is unclear if such translation could be useful in practice. In this paper, we describe nl2postcond, the problem of leveraging LLMs for transforming informal natural language to formal method postconditions, expressed as program assertions. We introduce and validate metrics to measure and compare different nl2postcond approaches, using the correctness and discriminative power of generated postconditions. We then use qualitative and quantitative methods to assess the quality of nl2postcond postconditions, finding that they are generally correct and able to discriminate incorrect code. Finally, we find that nl2postcond via LLMs has the potential to be helpful in practice; nl2postcond generated postconditions were able to catch 64 real-world historical bugs from Defects4J.
HCJul 31, 2025
Your Model Is Unfair, Are You Even Aware? Inverse Relationship Between Comprehension and Trust in Explainability Visualizations of Biased ML ModelsZhanna Kaufman, Madeline Endres, Cindy Xiong Bearfield et al.
Systems relying on ML have become ubiquitous, but so has biased behavior within them. Research shows that bias significantly affects stakeholders' trust in systems and how they use them. Further, stakeholders of different backgrounds view and trust the same systems differently. Thus, how ML models' behavior is explained plays a key role in comprehension and trust. We survey explainability visualizations, creating a taxonomy of design characteristics. We conduct user studies to evaluate five state-of-the-art visualization tools (LIME, SHAP, CP, Anchors, and ELI5) for model explainability, measuring how taxonomy characteristics affect comprehension, bias perception, and trust for non-expert ML users. Surprisingly, we find an inverse relationship between comprehension and trust: the better users understand the models, the less they trust them. We investigate the cause and find that this relationship is strongly mediated by bias perception: more comprehensible visualizations increase people's perception of bias, and increased bias perception reduces trust. We confirm this relationship is causal: Manipulating explainability visualizations to control comprehension, bias perception, and trust, we show that visualization design can significantly (p < 0.001) increase comprehension, increase perceived bias, and reduce trust. Conversely, reducing perceived model bias, either by improving model fairness or by adjusting visualization design, significantly increases trust even when comprehension remains high. Our work advances understanding of how comprehension affects trust and systematically investigates visualization's role in facilitating responsible ML applications.
SEDec 17, 2021
Hashing It Out: A Survey of Programmers' Cannabis Usage, Perception, and MotivationMadeline Endres, Kevin Boehnke, Westley Weimer
Cannabis is one of the most common mind-altering substances. It is used both medicinally and recreationally and is enmeshed in a complex and changing legal landscape. Anecdotal evidence suggests that some software developers may use cannabis to aid some programming tasks. At the same time, anti-drug policies and tests remain common in many software engineering environments, sometimes leading to hiring shortages for certain jobs. Despite these connections, little is actually known about the prevalence of, and motivation for, cannabis use while programming. In this paper, we report the results of the first large-scale survey of cannabis use by programmers. We report findings about 803 developers' (including 450 full-time programmers') cannabis usage prevalence, perceptions, and motivations. For example, we find that some programmers do regularly use cannabis while programming: 35% of our sample has tried programming while using cannabis, and 18% currently do so at least once a month. Furthermore, this cannabis usage is primarily motivated by a perceived enhancement to certain software development skills (such as brainstorming or getting into a programming zone) rather than medicinal reasons (such as pain relief). Finally, we find that cannabis use while programming occurs at similar rates for programming employees, managers, and students despite differences in cannabis perceptions and visibility. Our results have implications for programming job drug policies and motivate future research into cannabis use while programming.
SEFeb 24, 2021
Relating Reading, Visualization, and Coding for New Programmers: A Neuroimaging StudyMadeline Endres, Zachary Karas, Xiaosu Hu et al.
Understanding how novices reason about coding at a neurological level has implications for training the next generation of software engineers. In recent years, medical imaging has been increasingly employed to investigate patterns of neural activity associated with coding activity. However, such studies have focused on advanced undergraduates and professionals. In a human study of 31 participants, we use functional near-infrared spectroscopy to measure the neural activity associated with introductory programming. In a controlled, contrast-based experiment, we relate brain activity when coding to that of reading natural language or mentally rotating objects (a spatial visualization task). Our primary result is that all three tasks -- coding, prose reading, and mental rotation -- are mentally distinct for novices. However, while those tasks are neurally distinct, we find more significant differences between prose and coding than between mental rotation and coding. Intriguingly, we generally find more activation in areas of the brain associated with spatial ability and task difficulty for novice coding compared to that reported in studies with more expert developers. Finally, in an exploratory analysis, we also find a neural activation pattern predictive of programming performance 11 weeks later. While preliminary, these findings both expand on previous results (e.g., relating expertise to a similarity between coding and prose reading) and also provide a new understanding of the cognitive processes underlying novice programming.