Shurui Zhou

SE
h-index2
7papers
311citations
Novelty34%
AI Score44

7 Papers

SEMar 31, 2023
A Meta-Summary of Challenges in Building Products with ML Components -- Collecting Experiences from 4758+ Practitioners

Nadia Nahar, Haoran Zhang, Grace Lewis et al. · cmu

Incorporating machine learning (ML) components into software products raises new software-engineering challenges and exacerbates existing challenges. Many researchers have invested significant effort in understanding the challenges of industry practitioners working on building products with ML components, through interviews and surveys with practitioners. With the intention to aggregate and present their collective findings, we conduct a meta-summary study: We collect 50 relevant papers that together interacted with over 4758 practitioners using guidelines for systematic literature reviews. We then collected, grouped, and organized the over 500 mentions of challenges within those papers. We highlight the most commonly reported challenges and hope this meta-summary will be a useful resource for the research community to prioritize research and education in this field.

CRMar 28Code
Detecting Protracted Vulnerabilities in Open Source Projects

Arjun Sridharkumar, Sara Al Hajj Ibrahim, Jiayuan Zhou et al.

Timely resolution and disclosure of vulnerabilities are essential for maintaining the security of open-source software. However, many vulnerabilities remain unreported, unpatched, or undisclosed for extended periods, exposing users to prolonged security threats. While various vulnerability detection tools exist, they primarily focus on predicting or identifying known vulnerabilities, often failing to capture vulnerabilities that experience significant delays in resolution. In this study, we examine the vulnerability lifecycle by analyzing protracted vulnerabilities (PCVEs), which remain unresolved or undisclosed over long periods. We construct a dataset of PCVEs and conduct a qualitative analysis to uncover underlying causes of delay. To assess current automated solutions, we evaluate four state-of-the-art (SOTA) vulnerability detectors on our dataset. These tools detect only 1,059 out of 2,402 PCVEs, achieving approximately 44% coverage. To address this limitation, we propose DeeptraVul, an enhanced detection approach designed specifically for protracted cases. DeeptraVul integrates multiple development artifacts and code signals, supported by a Large Language Model (LLM)-based summarization component. For comparison, we also evaluate a standalone LLM. Our results show that DeeptraVul improves detection performance, achieving a 14% increase in coverage across all PCVEs and reaching 90% coverage on the DeeptraVul PCVE subset, outperforming existing SOTA detectors and standalone LLM based inference.

SEApr 13, 2022
Aspirations and Practice of Model Documentation: Moving the Needle with Nudging and Traceability

Avinash Bhat, Austin Coursey, Grace Hu et al.

The documentation practice for machine-learned (ML) models often falls short of established practices for traditional software, which impedes model accountability and inadvertently abets inappropriate or misuse of models. Recently, model cards, a proposal for model documentation, have attracted notable attention, but their impact on the actual practice is unclear. In this work, we systematically study the model documentation in the field and investigate how to encourage more responsible and accountable documentation practice. Our analysis of publicly available model cards reveals a substantial gap between the proposal and the practice. We then design a tool named DocML aiming to (1) nudge the data scientists to comply with the model cards proposal during the model development, especially the sections related to ethics, and (2) assess and manage the documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.

SEMar 10
"Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering

Jiessie Tie, Bingsheng Yao, Tianshi Li et al.

Software engineers are increasingly incorporating AI assistants into their workflows to enhance productivity and alleviate cognitive load. However, experiences with large language models (LLMs) such as ChatGPT vary widely. While some engineers find them useful, others deem them counterproductive due to inaccuracies in their responses. Researchers have also observed that ChatGPT often provides incorrect information. Given these limitations, it is crucial to determine how to effectively integrate LLMs into software engineering (SE) workflow. Analyzing data from 26 participants in a complex web development task, we identified nine failure types categorized into incorrect or incomplete responses, cognitive overload, and context loss. Users attempted to mitigate these issues through scaffolding, prompt clarification, and debugging. However, 17 participants ultimately chose to abandon ChatGPT due to persistent failures. Our quantitative analysis revealed that unhelpful responses increased the likelihood of abandonment by a factor of 11, while each additional prompt reduced abandonment probability by 17%. This study advances the understanding of human-AI interaction in SE tasks and outlines directions for future research and tooling support.

SEFeb 21, 2025Code
LLMs in Mobile Apps: Practices, Challenges, and Opportunities

Kimberly Hau, Safwat Hassan, Shurui Zhou

The integration of AI techniques has become increasingly popular in software development, enhancing performance, usability, and the availability of intelligent features. With the rise of large language models (LLMs) and generative AI, developers now have access to a wealth of high-quality open-source models and APIs from closed-source providers, enabling easier experimentation and integration of LLMs into various systems. This has also opened new possibilities in mobile application (app) development, allowing for more personalized and intelligent apps. However, integrating LLM into mobile apps might present unique challenges for developers, particularly regarding mobile device constraints, API management, and code infrastructure. In this project, we constructed a comprehensive dataset of 149 LLM-enabled Android apps and conducted an exploratory analysis to understand how LLMs are deployed and used within mobile apps. This analysis highlights key characteristics of the dataset, prevalent integration strategies, and common challenges developers face. Our findings provide valuable insights for future research and tooling development aimed at enhancing LLM-enabled mobile apps.

SEMar 17, 2020Code
An Exploratory Study to Find Motives Behind Cross-platform Forks from Software Heritage Dataset

Avijit Bhattacharjee, Sristy Sumana Nath, Shurui Zhou et al.

The fork-based development mechanism provides the flexibility and the unified processes for software teams to collaborate easily in a distributed setting without too much coordination overhead.Currently, multiple social coding platforms support fork-based development, such as GitHub, GitLab, and Bitbucket. Although these different platforms virtually share the same features, they have different emphasis. As GitHub is the most popular platform and the corresponding data is publicly available, most of the current studies are focusing on GitHub hosted projects. However, we observed anecdote evidences that people are confused about choosing among these platforms, and some projects are migrating from one platform to another, and the reasons behind these activities remain unknown.With the advances of Software Heritage Graph Dataset (SWHGD),we have the opportunity to investigate the forking activities across platforms. In this paper, we conduct an exploratory study on 10popular open-source projects to identify cross-platform forks and investigate the motivation behind. Preliminary result shows that cross-platform forks do exist. For the 10 subject systems in this study, we found 81,357 forks in total among which 179 forks are on GitLab. Based on our qualitative analysis, we found that most of the cross-platform forks that we identified are mirrors of the repositories on another platform, but we still find cases that were created due to preference of using certain functionalities (e.g. Continuous Integration (CI)) supported by different platforms. This study lays the foundation of future research directions, such as understanding the differences between platforms and supporting cross-platform collaboration.

SEOct 19, 2021
Collaboration Challenges in Building ML-Enabled Systems: Communication, Documentation, Engineering, and Process

Nadia Nahar, Shurui Zhou, Grace Lewis et al.

The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces additional challenges with its exploratory model development process, additional skills and knowledge needed, difficulties testing ML systems, need for continuous evolution and monitoring, and non-traditional quality requirements such as fairness and explainability. Through interviews with 45 practitioners from 28 organizations, we identified key collaboration challenges that teams face when building and deploying ML systems into production. We report on common collaboration points in the development of production ML systems for requirements, data, and integration, as well as corresponding team patterns and challenges. We find that most of these challenges center around communication, documentation, engineering, and process and collect recommendations to address these challenges.