CRMar 29
How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks?Ying Zhang, Wenjia Song, Zhengjie Ji et al.
Developers often build software on top of third-party libraries (Libs) to improve productivity, but these libraries may contain vulnerabilities that enable supply chain attacks. Existing tools detect vulnerable dependencies, yet developers often distrust their reports without concrete exploit evidence. Manually crafting such demonstrations is costly, and tool support is lacking. To help developers enhance software security, in this study, we systematically explored the usage of a large language model (LLM) --ChatGPT-4.0--to generate security tests, which unit tests demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps. In our exploration, we defined prompt templates to take in the various vulnerability-relevant information we manually collected, and generated prompts from those templates to query ChatGPT for security test generation. We found that ChatGPT-generated tests demonstrated 24 pieces of evidence or proof of vulnerability for 49 Apps. To assess the consistency of test generation, we also evaluated another five state-of-the-art LLMs. All the models generated security tests for at least 17 cases that successfully demonstrate the vulnerabilities. We filed six reports for the newly revealed vulnerabilities in Apps, and got four Common Vulnerability Entries (CVEs) assigned. Our use of ChatGPT outperformed two state-of-the-art security test generators (TRANSFER and SIEGE), by generating a lot more tests and achieving more attacks.
LGJul 19, 2022
Machine learning approach in the development of building occupant personasSheik Murad Hassan Anik, Xinghua Gao, Na Meng
The user persona is a communication tool for designers to generate a mental model that describes the archetype of users. Developing building occupant personas is proven to be an effective method for human-centered smart building design, which considers occupant comfort, behavior, and energy consumption. Optimization of building energy consumption also requires a deep understanding of occupants' preferences and behaviors. The current approaches to developing building occupant personas face a major obstruction of manual data processing and analysis. In this study, we propose and evaluate a machine learning-based semi-automated approach to generate building occupant personas. We investigate the 2015 Residential Energy Consumption Dataset with five machine learning techniques - Linear Discriminant Analysis, K Nearest Neighbors, Decision Tree (Random Forest), Support Vector Machine, and AdaBoost classifier - for the prediction of 16 occupant characteristics, such as age, education, and, thermal comfort. The models achieve an average accuracy of 61% and accuracy over 90% for attributes including the number of occupants in the household, their age group, and preferred usage of heating or cooling equipment. The results of the study show the feasibility of using machine learning techniques for the development of building occupant persona to minimize human effort.
CVSep 1, 2024
Equitable Skin Disease Prediction Using Transfer Learning and Domain AdaptationSajib Acharjee Dip, Kazi Hasan Ibn Arif, Uddip Acharjee Shuvo et al.
In the realm of dermatology, the complexity of diagnosing skin conditions manually necessitates the expertise of dermatologists. Accurate identification of various skin ailments, ranging from cancer to inflammatory diseases, is paramount. However, existing artificial intelligence (AI) models in dermatology face challenges, particularly in accurately diagnosing diseases across diverse skin tones, with a notable performance gap in darker skin. Additionally, the scarcity of publicly available, unbiased datasets hampers the development of inclusive AI diagnostic tools. To tackle the challenges in accurately predicting skin conditions across diverse skin tones, we employ a transfer-learning approach that capitalizes on the rich, transferable knowledge from various image domains. Our method integrates multiple pre-trained models from a wide range of sources, including general and specific medical images, to improve the robustness and inclusiveness of the skin condition predictions. We rigorously evaluated the effectiveness of these models using the Diverse Dermatology Images (DDI) dataset, which uniquely encompasses both underrepresented and common skin tones, making it an ideal benchmark for assessing our approach. Among all methods, Med-ViT emerged as the top performer due to its comprehensive feature representation learned from diverse image sources. To further enhance performance, we conducted domain adaptation using additional skin image datasets such as HAM10000. This adaptation significantly improved model performance across all models.
SEFeb 22, 2021Code
Automatic Detection and Resolution of Software Merge Conflicts: Are We There Yet?Bowen Shen, Cihan Xiao, Na Meng et al.
Developers create software branches for tentative feature addition and bug fixing, and periodically merge branches to release software with new features or repairing patches. When the program edits from different branches textually overlap (i.e., textual conflicts), or the co-application of those edits lead to compilation or runtime errors (i.e., compiling or dynamic conflicts), it is challenging and time-consuming for developers to eliminate merge conflicts. Prior studies examined %the popularity of merge conflicts and how conflicts were related to code smells or software development process; tools were built to find and solve conflicts. However, some fundamental research questions are still not comprehensively explored, including (1) how conflicts were introduced, (2) how developers manually resolved conflicts, and (3) what conflicts cannot be handled by current tools. For this paper, we took a hybrid approach that combines automatic detection with manual inspection to reveal 204 merge conflicts and their resolutions in 15 open-source repositories. %in the version history of 15 open-source projects. Our data analysis reveals three phenomena. First, compiling and dynamic conflicts are harder to detect, although current tools mainly focus on textual conflicts. Second, in the same merging context, developers usually resolved similar textual conflicts with similar strategies. Third, developers manually fixed most of the inspected compiling and dynamic conflicts by similarly editing the merged version as what they did for one of the branches. Our research reveals the challenges and opportunities for automatic detection and resolution of merge conflicts; it also sheds light on related areas like systematic program editing and change recommendation.
SEFeb 15, 2021Code
Investigating and Recommending Co-Changed Entities for JavaScript ProgramsZijian Jiang, Hao Zhong, Na Meng
JavaScript (JS) is one of the most popular programming languages due to its flexibility and versatility, but maintaining JS code is tedious and error-prone. In our research, we conducted an empirical study to characterize the relationship between co-changed software entities (e.g., functions and variables), and built a machine learning (ML)-based approach to recommend additional entity to edit given developers' code changes. Specifically, we first crawled 14,747 commits in 10 open-source projects; for each commit, we created one or more change dependency graphs (CDGs) to model the referencer-referencee relationship between co-changed entities. Next, we extracted the common subgraphs between CDGs to locate recurring co-change patterns between entities. Finally, based on those patterns, we extracted code features from co-changed entities and trained an ML model that recommends entities-to-change given a program commit. According to our empirical investigation, (1) three recurring patterns commonly exist in all projects; (2) 80%--90% of co-changed function pairs either invoke the same function(s), access the same variable(s), or contain similar statement(s); (3) our ML-based approach CoRec recommended entity changes with high accuracy (73%--78%). CoRec complements prior work because it suggests changes based on program syntax, textual similarity, as well as software history; it achieved higher accuracy than two existing tools in our evaluation.
CRFeb 13, 2021Code
Data-Driven Vulnerability Detection and Repair in Java CodeYing Zhang, Mahir Kabir, Ya Xiao et al.
Java platform provides various APIs to facilitate secure coding. However, correctly using security APIs is usually challenging for developers who lack cybersecurity training. Prior work shows that many developers misuse security APIs; such misuses can introduce vulnerabilities into software, void security protections, and present security exploits to hackers. To eliminate such API-related vulnerabilities, this paper presents SEADER -- our new approach that detects and repairs security API misuses. Given an exemplar, insecure code snippet, and its secure counterpart, SEADER compares the snippets and conducts data dependence analysis to infer the security API misuse templates and corresponding fixing operations. Based on the inferred information, given a program, SEADER performs inter-procedural static analysis to search for any security API misuse and to propose customized fixing suggestions for those vulnerabilities. To evaluate SEADER, we applied it to 25 <insecure, secure> code pairs, and SEADER successfully inferred 18 unique API misuse templates and related fixes. With these vulnerability repair patterns, we further applied SEADER to 10 open-source projects that contain in total 32 known vulnerabilities. Our experiment shows that SEADER detected vulnerabilities with 100% precision, 84% recall, and 91% accuracy. Additionally, we applied SEADER to 100 Apache open-source projects and detected 988 vulnerabilities; SEADER always customized repair suggestions correctly. Based on SEADER's outputs, we filed 60 pull requests. Up till now, developers of 18 projects have offered positive feedbacks on SEADER's suggestions. Our results indicate that SEADER can effectively help developers detect and fix security API misuses. Whereas prior work either detects API misuses or suggests simple fixes, SEADER is the first tool to do both for nontrivial vulnerability repairs.
CRMay 5
Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex SoftwareShravya Kanchi, Xiaoyan Zang, Ying Zhang et al.
Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through application code, the applications can be vulnerable to software supply chain attacks. Prior work shows that developers often require concrete and executable evidence, i.e., proof-of-vulnerability (PoV) tests, to decide whether a reported dependency vulnerability poses a practical security risk to their application. However, manually crafting such tests is challenging, and existing tool support is insufficient to automate the procedure. To streamline test generation, we created PoVSmith -- a new approach that combines call path analysis, exemplar test, code context, and feedback into multiple prompts to guide a coding agent (i.e., Codex) and a large language model (i.e., GPT) for test generation, execution, and assessment. We evaluated PoVSmith on 33 $\langle$App, Lib$\rangle$ Java program pairs, where each App depends on a vulnerable Lib. PoVSmith revealed 158 unique application-level entry points (i.e., public methods) calling vulnerable library APIs; 152 (96\%) of them were correctly found, together with the call paths properly recognized. With such method call information, PoVSmith generated 152 tests, 84 (55\%) of which demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities. PoVSmith substantially outperforms the state-of-the-art LLM-based approach, as it reduces human involvement while dramatically improving test quality. Our work contributes (1) a novel approach of agent-based test generation, (2) an iterative code refinement process driven by execution feedback, and (3) LLM-based quality assessment grounded in both the test context and execution logs.
SEJul 14, 2021
FAPR: Fast and Accurate Program Repair for Introductory Programming CoursesYunlong Lu, Na Meng, Wenxin Li
In introductory programming courses, it is challenging for instructors to provide debugging feedback on students' incorrect programs. Some recent tools automatically offer program repair feedback by identifying any differences between incorrect and correct programs, but suffer from issues related to scalability, accuracy, and cross-language portability. This paper presents FAPR -- our novel approach that suggests repairs based on program differences in a fast and accurate manner. FAPR is different from current tools in three aspects. First, it encodes syntactic information into token sequences to enable high-speed comparison between incorrect and correct programs. Second, to accurately extract program differences, FAPR adopts a novel matching algorithm that maximizes token-level matches and minimizes statement-level differences. Third, FAPR relies on testing instead of static/dynamic analysis to validate and refine candidate repairs, so it eliminates the language dependency or high runtime overhead incurred by complex program analysis. We implemented FAPR to suggest repairs for both C and C++ programs; our experience shows the great cross-language portability of FAPR. More importantly, we empirically compared FAPR with a state-of-the-art tool Clara. FAPR suggested repairs for over 95.5% of incorrect solutions. We sampled 250 repairs among FAPR's suggestions, and found 89.6% of the samples to be minimal and correct. FAPR outperformed Clara by suggesting repairs for more cases, creating smaller repairs, producing higher-quality fixes, and causing lower runtime overheads. Our results imply that FAPR can potentially help instructors or TAs to effectively locate bugs in incorrect code, and to provide debugging hints/guidelines based on those generated repairs.
SEMay 6, 2021
Migrating Client Code without Change ExamplesHao Zhong, Na Meng
API developers evolve software libraries to fix bugs, add new features, or refactor code. To benefit from such library evolution, the programmers of client projects have to repetitively upgrade their library usages and adapt their codebases to any library API breaking changes (e.g., API renaming). Such adaptive changes can be tedious and error-prone. Existing tools provide limited support to help programmers migrate client projects from old library versions to new ones. For instance, some tools extract API mappings be-tween library versions and only suggest simple adaptive changes (i.e., statement updates); other tools suggest or automate more complicated edits (e.g., statement insertions) based on user-provided exemplar code migrations. However, when new library versions are available, it is usually cumbersome and time-consuming for users to provide sufficient human-crafted samples in order to guide automatic migration. In this paper, we propose a novel approach, AutoUpdate, to further improve the state of the art. Instead of learning from change examples, we designed AutoUpdate to automate migration in a compiler-directed way. Namely, given a compilation error triggered by upgrading libraries, AutoUpdate exploits 13 migration opera-tors to generate candidate edits, and tentatively applies each edit until the error is resolved or all edits are explored. We conducted two experiments. The first experiment involves migrating 371 tutorial examples between versions of 5 popular libraries. AutoUpdate reduced migration-related compilation errors for 92.7% of tasks. It eliminated such errors for 32.4% of tasks, and 33.9% of the tasks have identical edits to manual migrations. In the second experiment, we applied AutoUpdate to migrate two real client projects of lucene. AutoUpdate successfully migrated both projects, and the migrated code passed all tests.
SEFeb 25, 2021
A Lightweight Approach of Human-Like PlaytestingYan Zhao, Weihao Zhang, Enyi Tang et al.
A playtest is the process in which human testers are recruited to play video games and to reveal software bugs. Manual testing is expensive and time-consuming, especially when there are many mobile games to test and every software version requires for extensive testing before being released. Existing testing frameworks (e.g., Android Monkey) are limited because they adopt no domain knowledge to play games. Learning-based tools (e.g., Wuji) involve a huge amount of training data and computation before testing any game. This paper presents LIT -- our lightweight approach to generalize playtesting tactics from manual testing, and to adopt the generalized tactics to automate game testing. LIT consists of two phases. In Phase I, while a human plays an Android game app G for a short period of time (e.g., eight minutes), \tool records the user's actions (e.g., swipe) and the scene before each action. Based on the collected data, LIT generalizes a set of \emph{context-aware, abstract playtesting tactics} which describe under what circumstances, what actions can be taken to play the game. In Phase II, LIT tests G based on the generalized tactics. Namely, given a randomly generated game scene, LIT searches match for the abstract context of any inferred tactic; if there is a match, LIT customizes the tactic and generates a feasible event to play the game. Our evaluation with nine games shows LIT to outperform two state-of-the-art tools. This implies that by automating playtest, LIT will significantly reduce manual testing and boost the quality of game apps.
SEFeb 24, 2021
Hero: On the Chaos When PATH Meets ModulesYing Wang, Liang Qiao, Chang Xu et al.
Ever since its first release in 2009, the Go programming language (Golang) has been well received by software communities. A major reason for its success is the powerful support of library-based development, where a Golang project can be conveniently built on top of other projects by referencing them as libraries. As Golang evolves, it recommends the use of a new library-referencing mode to overcome the limitations of the original one. While these two library modes are incompatible, both are supported by the Golang ecosystem. The heterogeneous use of library-referencing modes across Golang projects has caused numerous dependency management (DM) issues, incurring reference inconsistencies and even build failures. Motivated by the problem, we conducted an empirical study to characterize the DM issues, understand their root causes, and examine their fixing solutions. Based on our findings, we developed \textsc{Hero}, an automated technique to detect DM issues and suggest proper fixing solutions. We applied \textsc{Hero} to 19,000 popular Golang projects. The results showed that \textsc{Hero} achieved a high detection rate of 98.5\% on a DM issue benchmark and found 2,422 new DM issues in 2,356 popular Golang projects. We reported 280 issues, among which 181 (64.6\%) issues have been confirmed, and 160 of them (88.4\%) have been fixed or are under fixing. Almost all the fixes have adopted our fixing suggestions.
CRJul 28, 2020
Coding Practices and Recommendations of Spring Security for Enterprise ApplicationsMazharul Islam, Sazzadur Rahaman, Na Meng et al.
Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by conducting a measurement-based approach on 28 Spring applications. Our analysis shows that security risks associated with the identified security anti-patterns and insecure defaults can leave the enterprise application vulnerable to a wide range of high-risk attacks. To prevent these high-risk attacks, we also provide recommendations for practitioners. Consequently, our study has contributed one update to the official Spring security documentation while other security issues identified in this study are being considered for future major releases by Spring security community.
SEJan 4, 2019
How Reliable is the Crowdsourced Knowledge of Security Implementation?Mengsu Chen, Felix Fischer, Na Meng et al.
Stack Overflow (SO) is the most popular online Q&A site for developers to share their expertise in solving programming issues. Given multiple answers to certain questions, developers may take the accepted answer, the answer from a person with high reputation, or the one frequently suggested. However, researchers recently observed exploitable security vulnerabilities in popular SO answers. This observation inspires us to explore the following questions: How much can we trust the security implementation suggestions on SO? If suggested answers are vulnerable, can developers rely on the community's dynamics to infer the vulnerability and identify a secure counterpart? To answer these highly important questions, we conducted a study on SO posts by contrasting secure and insecure advices with the community-given content evaluation. We investigated whether SO incentive mechanism is effective in improving security properties of distributed code examples. Moreover, we also traced duplicated answers to assess whether the community behavior facilitates propagation of secure and insecure code suggestions. We compiled 953 different groups of similar security-related code examples and labeled their security, identifying 785 secure answer posts and 644 insecure ones. Compared with secure suggestions, insecure ones had higher view counts (36,508 vs. 18,713), received a higher score (14 vs. 5), and had significantly more duplicates (3.8 vs. 3.0) on average. 34% of the posts provided by highly reputable so-called trusted users were insecure. Our findings show that there are lots of insecure snippets on SO, while the community-given feedback does not allow differentiating secure from insecure choices. Moreover, the reputation mechanism fails in indicating trustworthy users with respect to security questions, ultimately leaving other users wandering around alone in a software security minefield.
SEJul 30, 2018
Automatic Clone Recommendation for Refactoring Based on the Present and the PastRuru Yue, Zhe Gao, Na Meng et al.
When many clones are detected in software programs, not all clones are equally important to developers. To help developers refactor code and improve software quality, various tools were built to recommend clone-removal refactorings based on the past and the present information, such as the cohesion degree of individual clones or the co-evolution relations of clone peers. The existence of these tools inspired us to build an approach that considers as many factors as possible to more accurately recommend clones. This paper introduces CREC, a learning-based approach that recommends clones by extracting features from the current status and past history of software projects. Given a set of software repositories, CREC first automatically extracts the clone groups historically refactored (R-clones) and those not refactored (NR-clones) to construct the training set. CREC extracts 34 features to characterize the content and evolution behaviors of individual clones, as well as the spatial, syntactical, and co-change relations of clone peers. With these features, CREC trains a classifier that recommends clones for refactoring. We designed the largest feature set thus far for clone recommendation, and performed an evaluation on six large projects. The results show that our approach suggested refactorings with 83% and 76% F-scores in the within-project and cross-project settings. CREC significantly outperforms a state-of-the-art similar approach on our data set, with the latter one achieving 70% and 50% F-scores. We also compared the effectiveness of different factors and different learning algorithms.
CRSep 28, 2017
Secure Coding Practices in Java: Challenges and VulnerabilitiesNa Meng, Stefan Nagy, Daphne Yao et al.
Java platform and third-party libraries provide various security features to facilitate secure coding. However, misusing these features can cost tremendous time and effort of developers or cause security vulnerabilities in software. Prior research was focused on the misuse of cryptography and SSL APIs, but did not explore the key fundamental research question: what are the biggest challenges and vulnerabilities in secure coding practices? In this paper, we conducted a comprehensive empirical study on StackOverflow posts to understand developers' concerns on Java secure coding, their programming obstacles, and potential vulnerabilities in their code. We observed that developers have shifted their effort to the usage of authentication and authorization features provided by Spring security--a third-party framework designed to secure enterprise applications. Multiple programming challenges are related to APIs or libraries, including the complicated cross-language data handling of cryptography APIs, and the complex Java-based or XML-based approaches to configure Spring security. More interestingly, we identified security vulnerabilities in the suggested code of accepted answers. The vulnerabilities included using insecure hash functions such as MD5, breaking SSL/TLS security through bypassing certificate validation, and insecurely disabling the default protection against Cross Site Request Forgery (CSRF) attacks. Our findings reveal the insufficiency of secure coding assistance and education, and the gap between security theory and coding practices.