SEApr 25, 2023
Empirical Evaluation of ChatGPT on Requirements Information Retrieval Under Zero-Shot SettingJianzhang Zhang, Yiyang Chen, Nan Niu et al.
Recently, various illustrative examples have shown the impressive ability of generative large language models (LLMs) to perform NLP related tasks. ChatGPT undoubtedly is the most representative model. We empirically evaluate ChatGPT's performance on requirements information retrieval (IR) tasks to derive insights into designing or developing more effective requirements retrieval methods or tools based on generative LLMs. We design an evaluation framework considering four different combinations of two popular IR tasks and two common artifact types. Under zero-shot setting, evaluation results reveal ChatGPT's promising ability to retrieve requirements relevant information (high recall) and limited ability to retrieve more specific requirements information (low precision). Our evaluation of ChatGPT on requirements IR under zero-shot setting provides preliminary evidence for designing or developing more effective requirements IR methods or tools based on LLMs.
20.9SEMar 30
Enhancing User-Feedback Driven Requirements PrioritizationAurek Chattopadhyay, Nan Niu, Hui Liu et al.
Context: Requirements prioritization is a challenging problem that is aimed to deliver the most suitable subset from a pool of candidate requirements. The problem is NP-hard when formulated as an optimization problem. Feedback from end users can offer valuable support for software evolution, and ReFeed represents a state-of-the-art in automatically inferring a requirement's priority via quantifiable properties of the feedback messages associated with a candidate requirement. Objectives: In this paper, we enhance ReFeed by shifting the focus of prioritization from treating requirements as independent entities toward interconnecting them. Additionally, we explore if interconnecting requirements provides additional value for search-based solutions. Methods: We leverage user feedback from mobile app store to group requirements into topically coherent clusters. Such interconnectedness, in turn, helps to auto-generate additional "requires" relations in candidate requirements. These "requires" pairs are then integrated into a search-based software engineering solution. Results: The experiments on 94 requirements prioritization instances from four real-world software applications show that our enhancement outperforms ReFeed. In addition, we illustrate how incorporating interconnectedness among requirements improves search-based solutions. Conclusion: Our findings show that requirements interconnectedness improves user feedback driven requirements prioritization, helps uncover additional "requires" relations in candidate requirements, and also strengthens search-based release planning.
18.0SEApr 1
Identifying Privacy Concerns in Upcoming Software Release: A Peek into the FutureAurek Chattopadhyay, Nan Niu
Identifying the features to be released in the next version of software, from a pool of potential candidates, is a challenging problem. User feedback from app stores is frequently used by software vendors for the evolution of apps across releases. Privacy feedback, although smaller in volume, carries a larger impact influencing app's success. Multiple existing work has focused on summarizing privacy concerns at the app level and has also shown that developers utilize feedback to implement security and privacy-related changes in subsequent releases. However, the current literature offers little support for release managers and developers in identifying privacy concerns prior to release. This gap exists as user reviews are typically available in app stores only after new features of a software system is released. In this paper, we introduce Pre-PI, a novel approach that summarizes privacy concerns for to-be-released features. Our method first maps existing features to semantically similar privacy reviews to learn feature-privacy review relations. We then simulate feedback for candidate features and generate concise summaries of privacy concerns. We evaluate Pre-PI across three real-world apps, and compare it with Hark, a state-of-the-art method that relies on post-release user feedback to identify privacy concerns. Results show that Pre-PI generates additional valid privacy concerns and identifies these concerns earlier than Hark, allowing proactive mitigation prior to release.
SEJul 13, 2021Code
A First Look at Developers' Live Chat on GitterLin Shi, Xiao Chen, Ye Yang et al.
Modern communication platforms such as Gitter and Slack play an increasingly critical role in supporting software teamwork, especially in open source development.Conversations on such platforms often contain intensive, valuable information that may be used for better understanding OSS developer communication and collaboration. However, little work has been done in this regard. To bridge the gap, this paper reports a first comprehensive empirical study on developers' live chat, investigating when they interact, what community structures look like, which topics are discussed, and how they interact. We manually analyze 749 dialogs in the first phase, followed by an automated analysis of over 173K dialogs in the second phase. We find that developers tend to converse more often on weekdays, especially on Wednesdays and Thursdays (UTC), that there are three common community structures observed, that developers tend to discuss topics such as API usages and errors, and that six dialog interaction patterns are identified in the live chat communities. Based on the findings, we provide recommendations for individual developers and OSS communities, highlight desired features for platform vendors, and shed light on future research directions. We believe that the findings and insights will enable a better understanding of developers' live chat, pave the way for other researchers, as well as a better utilization and mining of knowledge embedded in the massive chat history.
SEFeb 27, 2021
Extracting Concise Bug-Fixing Patches from Human-Written Patches in Version Control SystemsYanjie Jiang, Hui Liu, Nan Niu et al.
High-quality and large-scale repositories of real bugs and their concise patches collected from real-world applications are critical for research in software engineering community. In such a repository, each real bug is explicitly associated with its fix. Therefore, on one side, the real bugs and their fixes} may inspire novel approaches for finding, locating, and repairing software bugs; on the other side, the real bugs and their fixes are indispensable for rigorous and meaningful evaluation of approaches for software testing, fault localization, and program repair. To this end, a number of such repositories, e.g., Defects4J, have been proposed. However, such repositories are rather small because their construction involves expensive human intervention. Although bug-fixing code commits as well as associated test cases could be retrieved from version control systems automatically, existing approaches could not yet automatically extract concise bug-fixing patches from bug-fixing commits because such commits often involve bug-irrelevant changes. In this paper, we propose an automatic approach, called BugBuilder, to extracting complete and concise bug-fixing patches from human-written patches in version control systems. It excludes refactorings by detecting refactorings involved in bug-fixing commits, and reapplying detected refactorings on the faulty version. It enumerates all subsets of the remaining part and validates them on test cases. If none of the subsets has the potential to be a complete bug-fixing patch, the remaining part as a whole is taken as a complete and concise bug-fixing patch. Evaluation results on 809 real bug-fixing commits in Defects4J suggest that BugBuilder successfully generated complete and concise bug-fixing patches for forty percent of the bug-fixing commits, and its precision (99%) was even higher than human experts.
SEOct 30, 2018
Multi-Location Program Repair Strategies Learned from Past Successful ExperienceShangwen Wang, Xiaoguang Mao, Nan Niu et al.
Automated program repair (APR) has great potential to reduce the effort and time-consumption in software maintenance and becomes a hot topic in software engineering recently with many approaches being proposed. Multi-location program repair has always been a challenge in this field since its complexity in logic and structure. While some approaches do not claim to have the features for solving multi-location bugs, they generate correct patches for these defects in practice. In this paper, we first make an observation on multi-location bugs in Defects4J and divide them into two categories (i.e., similar and relevant multi-location bugs) based on the repair actions in their patches. We then summarize the situation of multi-location bugs in Defects4J fixed by current tools. We analyze the twenty-two patches generated by current tools and propose two feasible strategies for fixing multi-location bugs, illustrating them through two detailed case studies. At last, the experimental results prove the feasibility of our methods with the repair of two bugs that have never been fixed before. By learning from successful experience in the past, this paper points out possible ways ahead for multi-location program repair.