78.8CRMar 29
How Can ChatGPT Support Human Security Testers to Help Mitigate Supply Chain Attacks?Ying Zhang, Wenjia Song, Zhengjie Ji et al.
Developers often build software on top of third-party libraries (Libs) to improve productivity, but these libraries may contain vulnerabilities that enable supply chain attacks. Existing tools detect vulnerable dependencies, yet developers often distrust their reports without concrete exploit evidence. Manually crafting such demonstrations is costly, and tool support is lacking. To help developers enhance software security, in this study, we systematically explored the usage of a large language model (LLM) --ChatGPT-4.0--to generate security tests, which unit tests demonstrate how vulnerable library dependencies facilitate the supply chain attacks to given Apps. In our exploration, we defined prompt templates to take in the various vulnerability-relevant information we manually collected, and generated prompts from those templates to query ChatGPT for security test generation. We found that ChatGPT-generated tests demonstrated 24 pieces of evidence or proof of vulnerability for 49 Apps. To assess the consistency of test generation, we also evaluated another five state-of-the-art LLMs. All the models generated security tests for at least 17 cases that successfully demonstrate the vulnerabilities. We filed six reports for the newly revealed vulnerabilities in Apps, and got four Common Vulnerability Entries (CVEs) assigned. Our use of ChatGPT outperformed two state-of-the-art security test generators (TRANSFER and SIEGE), by generating a lot more tests and achieving more attacks.
QMNov 13, 2023
Deep Phenotyping of Non-Alcoholic Fatty Liver Disease Patients with Genetic Factors for Insights into the Complex DiseaseTahmina Sultana Priya, Fan Leng, Anthony C. Luehrs et al.
Non-alcoholic fatty liver disease (NAFLD) is a prevalent chronic liver disorder characterized by the excessive accumulation of fat in the liver in individuals who do not consume significant amounts of alcohol, including risk factors like obesity, insulin resistance, type 2 diabetes, etc. We aim to identify subgroups of NAFLD patients based on demographic, clinical, and genetic characteristics for precision medicine. The genomic and phenotypic data (3,408 cases and 4,739 controls) for this study were gathered from participants in Mayo Clinic Tapestry Study (IRB#19-000001) and their electric health records, including their demographic, clinical, and comorbidity data, and the genotype information through whole exome sequencing performed at Helix using the Exome+$^\circledR$ Assay according to standard procedure (www$.$helix$.$com). Factors highly relevant to NAFLD were determined by the chi-square test and stepwise backward-forward regression model. Latent class analysis (LCA) was performed on NAFLD cases using significant indicator variables to identify subgroups. The optimal clustering revealed 5 latent subgroups from 2,013 NAFLD patients (mean age 60.6 years and 62.1% women), while a polygenic risk score based on 6 single-nucleotide polymorphism (SNP) variants and disease outcomes were used to analyze the subgroups. The groups are characterized by metabolic syndrome, obesity, different comorbidities, psychoneurological factors, and genetic factors. Odds ratios were utilized to compare the risk of complex diseases, such as fibrosis, cirrhosis, and hepatocellular carcinoma (HCC), as well as liver failure between the clusters. Cluster 2 has a significantly higher complex disease outcome compared to other clusters. Keywords: Fatty liver disease; Polygenic risk score; Precision medicine; Deep phenotyping; NAFLD comorbidities; Latent class analysis.
CRDec 7, 2021Code
Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API BenchmarksSharmin Afrose, Ya Xiao, Sazzadur Rahaman et al.
Several studies showed that misuses of cryptographic APIs are common in real-world code (e.g., Apache projects and Android apps). There exist several open-sourced and commercial security tools that automatically screen Java programs to detect misuses. To compare their accuracy and security guarantees, we develop two comprehensive benchmarks named CryptoAPI-Bench and ApacheCryptoAPI-Bench. CryptoAPI-Bench consists of 181 unit test cases that cover basic cases, as well as complex cases, including interprocedural, field sensitive, multiple class test cases, and path sensitive data flow of misuse cases. The benchmark also includes correct cases for testing false-positive rates. The ApacheCryptoAPI-Bench consists of 121 cryptographic cases from 10 Apache projects. We evaluate four tools, namely, SpotBugs, CryptoGuard, CrySL, and Coverity using both benchmarks. We present their performance and comparative analysis. The ApacheCryptoAPI-Bench also examines the scalability of the tools. Our benchmarks are useful for advancing state-of-the-art solutions in the space of misuse detection.
CRFeb 13, 2021Code
Data-Driven Vulnerability Detection and Repair in Java CodeYing Zhang, Mahir Kabir, Ya Xiao et al.
Java platform provides various APIs to facilitate secure coding. However, correctly using security APIs is usually challenging for developers who lack cybersecurity training. Prior work shows that many developers misuse security APIs; such misuses can introduce vulnerabilities into software, void security protections, and present security exploits to hackers. To eliminate such API-related vulnerabilities, this paper presents SEADER -- our new approach that detects and repairs security API misuses. Given an exemplar, insecure code snippet, and its secure counterpart, SEADER compares the snippets and conducts data dependence analysis to infer the security API misuse templates and corresponding fixing operations. Based on the inferred information, given a program, SEADER performs inter-procedural static analysis to search for any security API misuse and to propose customized fixing suggestions for those vulnerabilities. To evaluate SEADER, we applied it to 25 <insecure, secure> code pairs, and SEADER successfully inferred 18 unique API misuse templates and related fixes. With these vulnerability repair patterns, we further applied SEADER to 10 open-source projects that contain in total 32 known vulnerabilities. Our experiment shows that SEADER detected vulnerabilities with 100% precision, 84% recall, and 91% accuracy. Additionally, we applied SEADER to 100 Apache open-source projects and detected 988 vulnerabilities; SEADER always customized repair suggestions correctly. Based on SEADER's outputs, we filed 60 pull requests. Up till now, developers of 18 projects have offered positive feedbacks on SEADER's suggestions. Our results indicate that SEADER can effectively help developers detect and fix security API misuses. Whereas prior work either detects API misuses or suggests simple fixes, SEADER is the first tool to do both for nontrivial vulnerability repairs.
CRNov 17, 2021
Privacy Guarantees of BLE Contact Tracing: A Case Study on COVIDWISESalman Ahmed, Ya Xiao, Taejoong et al.
Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scenarios (and expected typical adversary in mind), and quite realistic use cases. We do it in the context of Virginia's COVIDWISE app. This experimental analysis validates the properties of the system under the above conditions, a result that seems crucial for the peace of mind of the exposure notification technology adopting authorities, and may also help with the system's transparency and overall user trust.
CRJul 28, 2020
Coding Practices and Recommendations of Spring Security for Enterprise ApplicationsMazharul Islam, Sazzadur Rahaman, Na Meng et al.
Spring security is tremendously popular among practitioners for its ease of use to secure enterprise applications. In this paper, we study the application framework misconfiguration vulnerabilities in the light of Spring security, which is relatively understudied in the existing literature. Towards that goal, we identify 6 types of security anti-patterns and 4 insecure vulnerable defaults by conducting a measurement-based approach on 28 Spring applications. Our analysis shows that security risks associated with the identified security anti-patterns and insecure defaults can leave the enterprise application vulnerable to a wide range of high-risk attacks. To prevent these high-risk attacks, we also provide recommendations for practitioners. Consequently, our study has contributed one update to the official Spring security documentation while other security issues identified in this study are being considered for future major releases by Spring security community.
SEJul 12, 2020
Industrial Experience of Finding Cryptographic Vulnerabilities in Large-scale CodebasesYa Xiao, Yang Zhao, Nicholas Allen et al.
Enterprise environment often screens large-scale (millions of lines of code) codebases with static analysis tools to find bugs and vulnerabilities. Parfait is a static code analysis tool used in Oracle to find security vulnerabilities in industrial codebases. Recently, many studies show that there are complicated cryptographic vulnerabilities caused by misusing cryptographic APIs in Java. In this paper, we describe how we realize a precise and scalable detection of these complicated cryptographic vulnerabilities based on Parfait framework. The key challenge in the detection of cryptographic vulnerabilities is the high false alarm rate caused by pseudo-influences. Pseudo-influences happen if security-irrelevant constants are used in constructing security-critical values. Static analysis is usually unable to distinguish them from hard-coded constants that expose sensitive information. We tackle this problem by specializing the backward dataflow analysis used in Parfait with refinement insights, an idea from the tool CryptoGuard. We evaluate our analyzer on a comprehensive Java cryptographic vulnerability benchmark and eleven large real-world applications. The results show that the Parfait-based cryptographic vulnerability detector can find real-world cryptographic vulnerabilities in large-scale codebases with high true-positive rates and low runtime cost.
CRFeb 7, 2020
Security Certification in Payment Card Industry: Testbeds, Measurements, and RecommendationsSazzadur Rahaman, Gang Wang, Danfeng et al.
The massive payment card industry (PCI) involves various entities such as merchants, issuer banks, acquirer banks, and card brands. Ensuring security for all entities that process payment card information is a challenging task. The PCI Security Standards Council requires all entities to be compliant with the PCI Data Security Standard (DSS), which specifies a series of security requirements. However, little is known regarding how well PCI DSS is enforced in practice. In this paper, we take a measurement approach to systematically evaluate the PCI DSS certification process for e-commerce websites. We develop an e-commerce web application testbed, BuggyCart, which can flexibly add or remove 35 PCI DSS related vulnerabilities. Then we use the testbed to examine the capability and limitations of PCI scanners and the rigor of the certification process. We find that there is an alarming gap between the security standard and its real-world enforcement. None of the 6 PCI scanners we tested are fully compliant with the PCI scanning guidelines, issuing certificates to merchants that still have major vulnerabilities. To further examine the compliance status of real-world e-commerce websites, we build a new lightweight scanning tool named PciCheckerLite and scan 1,203 e-commerce websites across various business sectors. The results confirm that 86% of the websites have at least one PCI DSS violation that should have disqualified them as non-compliant. Our in-depth accuracy analysis also shows that PciCheckerLite's output is more precise than w3af. We reached out to the PCI Security Council to share our research results to improve the enforcement in practice.
CRNov 11, 2019
Neural Cryptanalysis: Metrics, Methodology, and Applications in CPS CiphersYa Xiao, Qingying Hao, Danfeng et al.
Many real-world cyber-physical systems (CPS) use proprietary cipher algorithms. In this work, we describe an easy-to-use black-box security evaluation approach to measure the strength of proprietary ciphers without having to know the algorithms. We quantify the strength of a cipher by measuring how difficult it is for a neural network to mimic the cipher algorithm. We define new metrics (e.g., cipher match rate, training data complexity and training time complexity) that are computed from neural networks to quantitatively represent the cipher strength. This measurement approach allows us to directly compare the security of ciphers. Our experimental demonstration utilizes fully connected neural networks with multiple parallel binary classifiers at the output layer. The results show that when compared with round-reduced DES, the security strength of Hitag2 (a popular stream cipher used in the keyless entry of modern cars) is weaker than 3-round DES.
CROct 7, 2019
Methodologies for Quantifying (Re-)randomization Security and Timing under JIT-ROPSalman Ahmed, Ya Xiao, Gang Tan et al.
Just-in-time return-oriented programming (JIT-ROP) allows one to dynamically discover instruction pages and launch code reuse attacks, effectively bypassing most fine-grained address space layout randomization (ASLR) protection. However, in-depth questions regarding the impact of code (re-)randomization on code reuse attacks have not been studied. For example, how would one compute the re-randomization interval effectively by considering the speed of gadget convergence to defeat JIT-ROP attacks?; how do starting pointers in JIT-ROP impact gadget availability and gadget convergence time?; what impact do fine-grained code randomizations have on the Turing-complete expressive power of JIT-ROP payloads? We conduct a comprehensive measurement study on the effectiveness of fine-grained code randomization schemes, with 5 tools, 20 applications including 6 browsers, 1 browser engine, and 25 dynamic libraries. We provide methodologies to measure JIT-ROP gadget availability, quality, and their Turing-complete expressiveness, as well as to empirically determine the upper bound of re-randomization intervals in re-randomization schemes using the Turing-complete (TC), priority, MOV TC, and payload gadget sets. Experiments show that the upper bound ranges from 1.5 to 3.5 seconds in our tested applications. Besides, our results show that locations of leaked pointers used in JIT-ROP attacks have no impacts on gadget availability, but have an impact on how fast attackers find gadgets. Our results also show that instruction-level single-round randomization thwarts current gadget finding techniques under the JIT-ROP threat model.
CRJun 18, 2018
CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java ProjectsSazzadur Rahaman, Ya Xiao, Sharmin Afrose et al.
Cryptographic API misuses, such as exposed secrets, predictable random numbers, and vulnerable certificate verification, seriously threaten software security. The vision of automatically screening cryptographic API calls in massive-sized (e.g., millions of LoC) Java programs is not new. However, hindered by the practical difficulty of reducing false positives without compromising analysis quality, this goal has not been accomplished. State-of-the-art crypto API screening solutions are not designed to operate on a large scale. Our technical innovation is a set of fast and highly accurate slicing algorithms. Our algorithms refine program slices by identifying language-specific irrelevant elements. The refinements reduce false alerts by 76% to 80% in our experiments. Running our tool, CrytoGuard, on 46 high-impact large-scale Apache projects and 6,181 Android apps generate many security insights. Our findings helped multiple popular Apache projects to harden their code, including Spark, Ranger, and Ofbiz. We also have made substantial progress towards the science of analysis in this space, including: i) manually analyzing 1,295 Apache alerts and confirming 1,277 true positives (98.61% precision), ii) creating a benchmark with 38-unit basic cases and 74-unit advanced cases, iii) performing an in-depth comparison with leading solutions including CrySL, SpotBugs, and Coverity. We are in the process of integrating CryptoGuard with the Software Assurance Marketplace (SWAMP).