Ya Xiao

CR
9papers
398citations
Novelty38%
AI Score25

9 Papers

CRDec 7, 2021Code
Evaluation of Static Vulnerability Detection Tools with Java Cryptographic API Benchmarks

Sharmin Afrose, Ya Xiao, Sazzadur Rahaman et al.

Several studies showed that misuses of cryptographic APIs are common in real-world code (e.g., Apache projects and Android apps). There exist several open-sourced and commercial security tools that automatically screen Java programs to detect misuses. To compare their accuracy and security guarantees, we develop two comprehensive benchmarks named CryptoAPI-Bench and ApacheCryptoAPI-Bench. CryptoAPI-Bench consists of 181 unit test cases that cover basic cases, as well as complex cases, including interprocedural, field sensitive, multiple class test cases, and path sensitive data flow of misuse cases. The benchmark also includes correct cases for testing false-positive rates. The ApacheCryptoAPI-Bench consists of 121 cryptographic cases from 10 Apache projects. We evaluate four tools, namely, SpotBugs, CryptoGuard, CrySL, and Coverity using both benchmarks. We present their performance and comparative analysis. The ApacheCryptoAPI-Bench also examines the scalability of the tools. Our benchmarks are useful for advancing state-of-the-art solutions in the space of misuse detection.

CRFeb 13, 2021Code
Data-Driven Vulnerability Detection and Repair in Java Code

Ying Zhang, Mahir Kabir, Ya Xiao et al.

Java platform provides various APIs to facilitate secure coding. However, correctly using security APIs is usually challenging for developers who lack cybersecurity training. Prior work shows that many developers misuse security APIs; such misuses can introduce vulnerabilities into software, void security protections, and present security exploits to hackers. To eliminate such API-related vulnerabilities, this paper presents SEADER -- our new approach that detects and repairs security API misuses. Given an exemplar, insecure code snippet, and its secure counterpart, SEADER compares the snippets and conducts data dependence analysis to infer the security API misuse templates and corresponding fixing operations. Based on the inferred information, given a program, SEADER performs inter-procedural static analysis to search for any security API misuse and to propose customized fixing suggestions for those vulnerabilities. To evaluate SEADER, we applied it to 25 <insecure, secure> code pairs, and SEADER successfully inferred 18 unique API misuse templates and related fixes. With these vulnerability repair patterns, we further applied SEADER to 10 open-source projects that contain in total 32 known vulnerabilities. Our experiment shows that SEADER detected vulnerabilities with 100% precision, 84% recall, and 91% accuracy. Additionally, we applied SEADER to 100 Apache open-source projects and detected 988 vulnerabilities; SEADER always customized repair suggestions correctly. Based on SEADER's outputs, we filed 60 pull requests. Up till now, developers of 18 projects have offered positive feedbacks on SEADER's suggestions. Our results indicate that SEADER can effectively help developers detect and fix security API misuses. Whereas prior work either detects API misuses or suggests simple fixes, SEADER is the first tool to do both for nontrivial vulnerability repairs.

CRNov 17, 2021
Privacy Guarantees of BLE Contact Tracing: A Case Study on COVIDWISE

Salman Ahmed, Ya Xiao, Taejoong et al.

Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scenarios (and expected typical adversary in mind), and quite realistic use cases. We do it in the context of Virginia's COVIDWISE app. This experimental analysis validates the properties of the system under the above conditions, a result that seems crucial for the peace of mind of the exposure notification technology adopting authorities, and may also help with the system's transparency and overall user trust.

SEMar 15, 2021
Embedding Code Contexts for Cryptographic API Suggestion:New Methodologies and Comparisons

Ya Xiao, Salman Ahmed, Wenjia Song et al.

Despite recent research efforts, the vision of automatic code generation through API recommendation has not been realized. Accuracy and expressiveness challenges of API recommendation needs to be systematically addressed. We present a new neural network-based approach, Multi-HyLSTM for API recommendation --targeting cryptography-related code. Multi-HyLSTM leverages program analysis to guide the API embedding and recommendation. By analyzing the data dependence paths of API methods, we train embedding and specialize a multi-path neural network architecture for API recommendation tasks that accurately predict the next API method call. We address two previously unreported programming language-specific challenges, differentiating functionally similar APIs and capturing low-frequency long-range influences. Our results confirm the effectiveness of our design choices, including program-analysis-guided embedding, multi-path code suggestion architecture, and low-frequency long-range-enhanced sequence learning, with high accuracy on top-1 recommendations. We achieve a top-1 accuracy of 91.41% compared with 77.44% from the state-of-the-art tool SLANG. In an analysis of 245 test cases, compared with the commercial tool Codota, we achieve a top-1 recommendation accuracy of 88.98%, which is significantly better than Codota's accuracy of 64.90%. We publish our data and code as a large Java cryptographic code dataset.

SEJul 12, 2020
Industrial Experience of Finding Cryptographic Vulnerabilities in Large-scale Codebases

Ya Xiao, Yang Zhao, Nicholas Allen et al.

Enterprise environment often screens large-scale (millions of lines of code) codebases with static analysis tools to find bugs and vulnerabilities. Parfait is a static code analysis tool used in Oracle to find security vulnerabilities in industrial codebases. Recently, many studies show that there are complicated cryptographic vulnerabilities caused by misusing cryptographic APIs in Java. In this paper, we describe how we realize a precise and scalable detection of these complicated cryptographic vulnerabilities based on Parfait framework. The key challenge in the detection of cryptographic vulnerabilities is the high false alarm rate caused by pseudo-influences. Pseudo-influences happen if security-irrelevant constants are used in constructing security-critical values. Static analysis is usually unable to distinguish them from hard-coded constants that expose sensitive information. We tackle this problem by specializing the backward dataflow analysis used in Parfait with refinement insights, an idea from the tool CryptoGuard. We evaluate our analyzer on a comprehensive Java cryptographic vulnerability benchmark and eleven large real-world applications. The results show that the Parfait-based cryptographic vulnerability detector can find real-world cryptographic vulnerabilities in large-scale codebases with high true-positive rates and low runtime cost.

CRMar 30, 2020
Deep Learning-Based Anomaly Detection in Cyber-Physical Systems: Progress and Opportunities

Yuan Luo, Ya Xiao, Long Cheng et al.

Anomaly detection is crucial to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of CPSs and more sophisticated attacks, conventional anomaly detection methods, which face the growing volume of data and need domain-specific knowledge, cannot be directly applied to address these challenges. To this end, deep learning-based anomaly detection (DLAD) methods have been proposed. In this paper, we review state-of-the-art DLAD methods in CPSs. We propose a taxonomy in terms of the type of anomalies, strategies, implementation, and evaluation metrics to understand the essential properties of current methods. Further, we utilize this taxonomy to identify and highlight new characteristics and designs in each CPS domain. Also, we discuss the limitations and open problems of these methods. Moreover, to give users insights into choosing proper DLAD methods in practice, we experimentally explore the characteristics of typical neural models, the workflow of DLAD methods, and the running performance of DL models. Finally, we discuss the deficiencies of DL approaches, our findings, and possible directions to improve DLAD methods and motivate future research.

CRNov 11, 2019
Neural Cryptanalysis: Metrics, Methodology, and Applications in CPS Ciphers

Ya Xiao, Qingying Hao, Danfeng et al.

Many real-world cyber-physical systems (CPS) use proprietary cipher algorithms. In this work, we describe an easy-to-use black-box security evaluation approach to measure the strength of proprietary ciphers without having to know the algorithms. We quantify the strength of a cipher by measuring how difficult it is for a neural network to mimic the cipher algorithm. We define new metrics (e.g., cipher match rate, training data complexity and training time complexity) that are computed from neural networks to quantitatively represent the cipher strength. This measurement approach allows us to directly compare the security of ciphers. Our experimental demonstration utilizes fully connected neural networks with multiple parallel binary classifiers at the output layer. The results show that when compared with round-reduced DES, the security strength of Hitag2 (a popular stream cipher used in the keyless entry of modern cars) is weaker than 3-round DES.

CROct 7, 2019
Methodologies for Quantifying (Re-)randomization Security and Timing under JIT-ROP

Salman Ahmed, Ya Xiao, Gang Tan et al.

Just-in-time return-oriented programming (JIT-ROP) allows one to dynamically discover instruction pages and launch code reuse attacks, effectively bypassing most fine-grained address space layout randomization (ASLR) protection. However, in-depth questions regarding the impact of code (re-)randomization on code reuse attacks have not been studied. For example, how would one compute the re-randomization interval effectively by considering the speed of gadget convergence to defeat JIT-ROP attacks?; how do starting pointers in JIT-ROP impact gadget availability and gadget convergence time?; what impact do fine-grained code randomizations have on the Turing-complete expressive power of JIT-ROP payloads? We conduct a comprehensive measurement study on the effectiveness of fine-grained code randomization schemes, with 5 tools, 20 applications including 6 browsers, 1 browser engine, and 25 dynamic libraries. We provide methodologies to measure JIT-ROP gadget availability, quality, and their Turing-complete expressiveness, as well as to empirically determine the upper bound of re-randomization intervals in re-randomization schemes using the Turing-complete (TC), priority, MOV TC, and payload gadget sets. Experiments show that the upper bound ranges from 1.5 to 3.5 seconds in our tested applications. Besides, our results show that locations of leaked pointers used in JIT-ROP attacks have no impacts on gadget availability, but have an impact on how fast attackers find gadgets. Our results also show that instruction-level single-round randomization thwarts current gadget finding techniques under the JIT-ROP threat model.

CRJun 18, 2018
CryptoGuard: High Precision Detection of Cryptographic Vulnerabilities in Massive-sized Java Projects

Sazzadur Rahaman, Ya Xiao, Sharmin Afrose et al.

Cryptographic API misuses, such as exposed secrets, predictable random numbers, and vulnerable certificate verification, seriously threaten software security. The vision of automatically screening cryptographic API calls in massive-sized (e.g., millions of LoC) Java programs is not new. However, hindered by the practical difficulty of reducing false positives without compromising analysis quality, this goal has not been accomplished. State-of-the-art crypto API screening solutions are not designed to operate on a large scale. Our technical innovation is a set of fast and highly accurate slicing algorithms. Our algorithms refine program slices by identifying language-specific irrelevant elements. The refinements reduce false alerts by 76% to 80% in our experiments. Running our tool, CrytoGuard, on 46 high-impact large-scale Apache projects and 6,181 Android apps generate many security insights. Our findings helped multiple popular Apache projects to harden their code, including Spark, Ranger, and Ofbiz. We also have made substantial progress towards the science of analysis in this space, including: i) manually analyzing 1,295 Apache alerts and confirming 1,277 true positives (98.61% precision), ii) creating a benchmark with 38-unit basic cases and 74-unit advanced cases, iii) performing an in-depth comparison with leading solutions including CrySL, SpotBugs, and Coverity. We are in the process of integrating CryptoGuard with the Software Assurance Marketplace (SWAMP).