Danfeng Yao

h-index41

7papers

304citations

Novelty34%

AI Score38

Ranked #86,605 of 194,257 authors (top 45%)#2,109 in CR (top 31%)

7 Papers

6.7CRMay 5

Generating Proof-of-Vulnerability Tests to Help Enhance the Security of Complex Software

Shravya Kanchi, Xiaoyan Zang, Ying Zhang et al.

Developers create modern software applications (Apps) on top of third-party libraries (Libs). When library vulnerabilities are reachable through application code, the applications can be vulnerable to software supply chain attacks. Prior work shows that developers often require concrete and executable evidence, i.e., proof-of-vulnerability (PoV) tests, to decide whether a reported dependency vulnerability poses a practical security risk to their application. However, manually crafting such tests is challenging, and existing tool support is insufficient to automate the procedure. To streamline test generation, we created PoVSmith -- a new approach that combines call path analysis, exemplar test, code context, and feedback into multiple prompts to guide a coding agent (i.e., Codex) and a large language model (i.e., GPT) for test generation, execution, and assessment. We evaluated PoVSmith on 33 $\langle$App, Lib$\rangle$ Java program pairs, where each App depends on a vulnerable Lib. PoVSmith revealed 158 unique application-level entry points (i.e., public methods) calling vulnerable library APIs; 152 (96\%) of them were correctly found, together with the call paths properly recognized. With such method call information, PoVSmith generated 152 tests, 84 (55\%) of which demonstrated feasible ways of attacking Apps by exploiting Lib vulnerabilities. PoVSmith substantially outperforms the state-of-the-art LLM-based approach, as it reduces human involvement while dramatically improving test quality. Our work contributes (1) a novel approach of agent-based test generation, (2) an iterative code refinement process driven by execution feedback, and (3) LLM-based quality assessment grounded in both the test context and execution logs.

6.6CRNov 17, 2021

Privacy Guarantees of BLE Contact Tracing: A Case Study on COVIDWISE

Salman Ahmed, Ya Xiao, Taejoong et al.

Google and Apple jointly introduced a digital contact tracing technology and an API called "exposure notification," to help health organizations and governments with contact tracing. The technology and its interplay with security and privacy constraints require investigation. In this study, we examine and analyze the security, privacy, and reliability of the technology with actual and typical scenarios (and expected typical adversary in mind), and quite realistic use cases. We do it in the context of Virginia's COVIDWISE app. This experimental analysis validates the properties of the system under the above conditions, a result that seems crucial for the peace of mind of the exposure notification technology adopting authorities, and may also help with the system's transparency and overall user trust.

3.6SEMar 15, 2021

Embedding Code Contexts for Cryptographic API Suggestion:New Methodologies and Comparisons

Ya Xiao, Salman Ahmed, Wenjia Song et al.

Despite recent research efforts, the vision of automatic code generation through API recommendation has not been realized. Accuracy and expressiveness challenges of API recommendation needs to be systematically addressed. We present a new neural network-based approach, Multi-HyLSTM for API recommendation --targeting cryptography-related code. Multi-HyLSTM leverages program analysis to guide the API embedding and recommendation. By analyzing the data dependence paths of API methods, we train embedding and specialize a multi-path neural network architecture for API recommendation tasks that accurately predict the next API method call. We address two previously unreported programming language-specific challenges, differentiating functionally similar APIs and capturing low-frequency long-range influences. Our results confirm the effectiveness of our design choices, including program-analysis-guided embedding, multi-path code suggestion architecture, and low-frequency long-range-enhanced sequence learning, with high accuracy on top-1 recommendations. We achieve a top-1 accuracy of 91.41% compared with 77.44% from the state-of-the-art tool SLANG. In an analysis of 245 test cases, compared with the commercial tool Codota, we achieve a top-1 recommendation accuracy of 88.98%, which is significantly better than Codota's accuracy of 64.90%. We publish our data and code as a large Java cryptographic code dataset.

26.6CRMar 30, 2020

Deep Learning-Based Anomaly Detection in Cyber-Physical Systems: Progress and Opportunities

Yuan Luo, Ya Xiao, Long Cheng et al.

Anomaly detection is crucial to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of CPSs and more sophisticated attacks, conventional anomaly detection methods, which face the growing volume of data and need domain-specific knowledge, cannot be directly applied to address these challenges. To this end, deep learning-based anomaly detection (DLAD) methods have been proposed. In this paper, we review state-of-the-art DLAD methods in CPSs. We propose a taxonomy in terms of the type of anomalies, strategies, implementation, and evaluation metrics to understand the essential properties of current methods. Further, we utilize this taxonomy to identify and highlight new characteristics and designs in each CPS domain. Also, we discuss the limitations and open problems of these methods. Moreover, to give users insights into choosing proper DLAD methods in practice, we experimentally explore the characteristics of typical neural models, the workflow of DLAD methods, and the running performance of DL models. Finally, we discuss the deficiencies of DL approaches, our findings, and possible directions to improve DLAD methods and motivate future research.

9.7CRFeb 22, 2019

Exploitation Techniques and Defenses for Data-Oriented Attacks

Long Cheng, Hans Liljestrand, Thomas Nyman et al.

Data-oriented attacks manipulate non-control data to alter a program's benign behavior without violating its control-flow integrity. It has been shown that such attacks can cause significant damage even in the presence of control-flow defense mechanisms. However, these threats have not been adequately addressed. In this SoK paper, we first map data-oriented exploits, including Data-Oriented Programming (DOP) attacks, to their assumptions/requirements and attack capabilities. We also compare known defenses against these attacks, in terms of approach, detection capabilities, overhead, and compatibility. Then, we experimentally assess the feasibility of a detection approach that is based on the Intel Processor Trace (PT) technology. PT only traces control flows, thus, is generally believed to be not useful for data-oriented security. However, our work reveals that data-oriented attacks (in particular the recent DOP attacks) may generate side-effects on control-flow behavior in multiple dimensions, which manifest in PT traces. Based on this evaluation, we discuss challenges for building deployable data-oriented defenses and open research questions.

10.6CRApr 30, 2018

Checking is Believing: Event-Aware Program Anomaly Detection in Cyber-Physical Systems

Long Cheng, Ke Tian, Danfeng Yao et al.

Securing cyber-physical systems (CPS) against malicious attacks is of paramount importance because these attacks may cause irreparable damages to physical systems. Recent studies have revealed that control programs running on CPS devices suffer from both control-oriented attacks (e.g., code-injection or code-reuse attacks) and data-oriented attacks (e.g., non-control data attacks). Unfortunately, existing detection mechanisms are insufficient to detect runtime data-oriented exploits, due to the lack of runtime execution semantics checking. In this work, we propose Orpheus, a new security methodology for defending against data-oriented attacks by enforcing cyber-physical execution semantics. We first present a general method for reasoning cyber-physical execution semantics of a control program (i.e., causal dependencies between the physical context and program control flows), including the event identification and dependence analysis. As an instantiation of Orpheus, we then present a new program behavior model, i.e., the event-aware finite-state automaton (eFSA). eFSA takes advantage of the event-driven nature of CPS control programs and incorporates event checking in anomaly detection. It detects data-oriented exploits if a specific physical event is missing along with the corresponding event dependent state transition. We evaluate our prototype's performance by conducting case studies under data-oriented attacks. Results show that eFSA can successfully detect different runtime attacks. Our prototype on Raspberry Pi incurs a low overhead, taking 0.0001s for each state transition integrity checking, and 0.063s~0.211s for the cyber-physical contextual consistency checking.

9.1CRJan 18, 2017

Breaking the Target: An Analysis of Target Data Breach and Lessons Learned

Xiaokui Shu, Ke Tian, Andrew Ciambrone et al.

This paper investigates and examines the events leading up to the second most devastating data breach in history: the attack on the Target Corporation. It includes a thorough step-by-step analysis of this attack and a comprehensive anatomy of the malware named BlackPOS. Also, this paper provides insight into the legal aspect of cybercrimes, along with a prosecution and sentence example of the well-known TJX case. Furthermore, we point out an urgent need for improving security mechanisms in existing systems of merchants and propose three security guidelines and defenses. Credit card security is discussed at the end of the paper with several best practices given to customers to hide their card information in purchase transactions.