Xinyu Xing

LG
h-index12
27papers
1,728citations
Novelty51%
AI Score50

27 Papers

AISep 19, 2023Code
GPTFUZZER: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts

Jiahao Yu, Xingwei Lin, Zheng Yu et al.

Large language models (LLMs) have recently experienced tremendous popularity and are widely used from casual conversations to AI-driven programming. However, despite their considerable success, LLMs are not entirely reliable and can give detailed guidance on how to conduct harmful or illegal activities. While safety measures can reduce the risk of such outputs, adversarial jailbreak attacks can still exploit LLMs to produce harmful content. These jailbreak templates are typically manually crafted, making large-scale testing challenging. In this paper, we introduce GPTFuzz, a novel black-box jailbreak fuzzing framework inspired by the AFL fuzzing framework. Instead of manual engineering, GPTFuzz automates the generation of jailbreak templates for red-teaming LLMs. At its core, GPTFuzz starts with human-written templates as initial seeds, then mutates them to produce new templates. We detail three key components of GPTFuzz: a seed selection strategy for balancing efficiency and variability, mutate operators for creating semantically equivalent or similar sentences, and a judgment model to assess the success of a jailbreak attack. We evaluate GPTFuzz against various commercial and open-source LLMs, including ChatGPT, LLaMa-2, and Vicuna, under diverse attack scenarios. Our results indicate that GPTFuzz consistently produces jailbreak templates with a high success rate, surpassing human-crafted templates. Remarkably, GPTFuzz achieves over 90% attack success rates against ChatGPT and Llama-2 models, even with suboptimal initial seed templates. We anticipate that GPTFuzz will be instrumental for researchers and practitioners in examining LLM robustness and will encourage further exploration into enhancing LLM safety.

CRNov 20, 2023
Assessing Prompt Injection Risks in 200+ Custom GPTs

Jiahao Yu, Yuhang Wu, Dong Shu et al.

In the rapidly evolving landscape of artificial intelligence, ChatGPT has been widely used in various applications. The new feature - customization of ChatGPT models by users to cater to specific needs has opened new frontiers in AI utility. However, this study reveals a significant security vulnerability inherent in these user-customized GPTs: prompt injection attacks. Through comprehensive testing of over 200 user-designed GPT models via adversarial prompts, we demonstrate that these systems are susceptible to prompt injections. Through prompt injection, an adversary can not only extract the customized system prompts but also access the uploaded files. This paper provides a first-hand analysis of the prompt injection, alongside the evaluation of the possible mitigation of such attacks. Our findings underscore the urgent need for robust security frameworks in the design and deployment of customizable GPT models. The intent of this paper is to raise awareness and prompt action in the AI community, ensuring that the benefits of GPT customization do not come at the cost of compromised security and privacy.

24.5SEMar 20
Patch Validation in Automated Vulnerability Repair

Zheng Yu, Wenxuan Shi, Xinqian Sun et al.

Automated Vulnerability Repair (AVR) systems, especially those leveraging large language models (LLMs), have demonstrated promising results in patching vulnerabilities -- that is, if we trust their patch validation methodology. Ground-truth patches from human developers often come with new tests that not only ensure mitigation of the vulnerability but also encode extra semantics such as root cause location, optimal fix strategy, or subtle coding styles or conventions. And yet, none of the recent AVR systems verify that the auto-generated patches additionally pass these new tests (termed as $\text{PoC}^+$ tests). This is a subtle yet critical omission. To fill this gap, we constructed a benchmark, $\textrm{PVBench}$, with 209 cases spanning 20 projects. Each case includes basic tests (functional tests before the patch and the PoC exploit) as well as the associated $\text{PoC}^+$ tests. Evaluated on three state-of-the-art AVR systems, we find that over 40\% of patches validated as correct by basic tests fail under $\text{PoC}^+$ testing, revealing substantial overestimation on patch success rates. Analyzing these patches that are falsely labeled as correct, we suggest that AVR tools should improve in three critical areas: root cause analysis, adherence to program specifications, and capturing developer intention.

LGMay 5, 2024
RICE: Breaking Through the Training Bottlenecks of Reinforcement Learning with Explanation

Zelei Cheng, Xian Wu, Jiahao Yu et al.

Deep reinforcement learning (DRL) is playing an increasingly important role in real-world applications. However, obtaining an optimally performing DRL agent for complex tasks, especially with sparse rewards, remains a significant challenge. The training of a DRL agent can be often trapped in a bottleneck without further progress. In this paper, we propose RICE, an innovative refining scheme for reinforcement learning that incorporates explanation methods to break through the training bottlenecks. The high-level idea of RICE is to construct a new initial state distribution that combines both the default initial states and critical states identified through explanation methods, thereby encouraging the agent to explore from the mixed initial states. Through careful design, we can theoretically guarantee that our refining scheme has a tighter sub-optimality bound. We evaluate RICE in various popular RL environments and real-world applications. The results demonstrate that RICE significantly outperforms existing refining schemes in enhancing agent performance.

LGFeb 8, 2025
A Survey on Explainable Deep Reinforcement Learning

Zelei Cheng, Jiahao Yu, Xinyu Xing

Deep Reinforcement Learning (DRL) has achieved remarkable success in sequential decision-making tasks across diverse domains, yet its reliance on black-box neural architectures hinders interpretability, trust, and deployment in high-stakes applications. Explainable Deep Reinforcement Learning (XRL) addresses these challenges by enhancing transparency through feature-level, state-level, dataset-level, and model-level explanation techniques. This survey provides a comprehensive review of XRL methods, evaluates their qualitative and quantitative assessment frameworks, and explores their role in policy refinement, adversarial robustness, and security. Additionally, we examine the integration of reinforcement learning with Large Language Models (LLMs), particularly through Reinforcement Learning from Human Feedback (RLHF), which optimizes AI alignment with human preferences. We conclude by highlighting open research challenges and future directions to advance the development of interpretable, reliable, and accountable DRL systems.

AIOct 18, 2024
Soft-Label Integration for Robust Toxicity Classification

Zelei Cheng, Xian Wu, Jiahao Yu et al.

Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) may fail to address the potential shifts between the training set and testing set due to exploiting spurious correlations. This work introduces a novel bi-level optimization framework that integrates crowdsourced annotations with the soft-labeling technique and optimizes the soft-label weights by Group Distributionally Robust Optimization (GroupDRO) to enhance the robustness against out-of-distribution (OOD) risk. We theoretically prove the convergence of our bi-level optimization algorithm. Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.

AISep 19, 2025
GPO: Learning from Critical Steps to Improve LLM Reasoning

Jiahao Yu, Zelei Cheng, Xian Wu et al.

Large language models (LLMs) are increasingly used in various domains, showing impressive potential on different tasks. Recently, reasoning LLMs have been proposed to improve the \textit{reasoning} or \textit{thinking} capabilities of LLMs to solve complex problems. Despite the promising results of reasoning LLMs, enhancing the multi-step reasoning capabilities of LLMs still remains a significant challenge. While existing optimization methods have advanced the LLM reasoning capabilities, they often treat reasoning trajectories as a whole, without considering the underlying critical steps within the trajectory. In this paper, we introduce \textbf{G}uided \textbf{P}ivotal \textbf{O}ptimization (GPO), a novel fine-tuning strategy that dives into the reasoning process to enable more effective improvements. GPO first identifies the `critical step' within a reasoning trajectory - a point that the model must carefully proceed to succeed at the problem. We locate the critical step by estimating the advantage function. GPO then resets the policy to the critical step, samples the new rollout and prioritizes the learning process on those rollouts. This focus allows the model to learn more effectively from pivotal moments within the reasoning process to improve the reasoning performance. We demonstrate that GPO is a general strategy that can be integrated with various optimization methods to improve reasoning performance. Besides theoretical analysis, our experiments across challenging reasoning benchmarks show that GPO can consistently and significantly enhance the performance of existing optimization methods, showcasing its effectiveness and generalizability in improving LLM reasoning by concentrating on pivotal moments within the generation process.

AISep 15, 2025
Building Coding Agents via Entropy-Enhanced Multi-Turn Preference Optimization

Jiahao Yu, Zelei Cheng, Xian Wu et al.

Software engineering presents complex, multi-step challenges for Large Language Models (LLMs), requiring reasoning over large codebases and coordinated tool use. The difficulty of these tasks is exemplified by benchmarks like SWE-bench, where current LLMs still struggle to resolve real-world issues. A promising approach to enhance performance is test-time scaling (TTS), but its gains are heavily dependent on the diversity of model outputs. While standard alignment methods such as Direct Preference Optimization (DPO) and Kahneman-Tversky Optimization (KTO) are effective at aligning model outputs with human preferences, this process can come at the cost of reduced diversity, limiting the effectiveness of TTS. Additionally, existing preference optimization algorithms are typically designed for single-turn tasks and do not fully address the complexities of multi-turn reasoning and tool integration required for interactive coding agents. To bridge this gap, we introduce EntroPO, an entropy-enhanced framework that adapts existing preference optimization algorithms to the multi-turn, tool-assisted setting. EntroPO augments the preference objective to explicitly preserve policy entropy and generalizes learning to optimize over multi-turn interactions rather than single-turn responses. We validate EntroPO by fine-tuning a diverse suite of models from different families and sizes (up to 106B parameters). To maximize performance gains from TTS, we further propose a hybrid best-trajectory selection scheme combining a learned verifier model with model free approaches. On the \swebench leaderboard, our approach establishes new state-of-the-art results among open-weight models. A 30B parameter model trained with EntroPO ranks 1st on \lite and 4th on \verified on the open-weight leaderboard, surpassed only by models with over 10x more parameters(\eg$>$350B).

CLMar 10, 2025
UC-MOA: Utility-Conditioned Multi-Objective Alignment for Distributional Pareto-Optimality

Zelei Cheng, Xin-Qiang Cai, Yuting Tang et al.

Reinforcement Learning from Human Feedback (RLHF) has become a cornerstone for aligning large language models (LLMs) with human values. However, existing approaches struggle to capture the multi-dimensional, distributional nuances of human preferences. Methods such as RiC that directly inject raw reward values into prompts face significant numerical sensitivity issues--for instance, LLMs may fail to distinguish between 9.11 and 9.8--while alternatives like MORLHF, Rewarded Soups, and MODPO incur high computational costs by training multiple models. In this work, we introduce Utility-Conditioned Multi-Objective Alignment (UC-MOA), a novel framework that overcomes these limitations. Our approach leverages a diverse set of strictly increasing, non-linear utility functions to transform user-specified preferences into symbolic tokens, which are then used to condition a single LLM. This design not only mitigates numerical reasoning challenges but also substantially reduces training overhead, yielding models that achieve superior Pareto fronts and robust alignment across complex reward dimensions.

CLJun 3, 2024
Decoupled Alignment for Robust Plug-and-Play Adaptation

Haozheng Luo, Jiahao Yu, Wenxin Zhang et al.

We introduce a low-resource safety enhancement method for aligning large language models (LLMs) without the need for supervised fine-tuning (SFT) or reinforcement learning from human feedback (RLHF). Our main idea is to exploit knowledge distillation to extract the alignment information from existing well-aligned LLMs and integrate it into unaligned LLMs in a plug-and-play fashion. Methodology, we employ delta debugging to identify the critical components of knowledge necessary for effective distillation. On the harmful question dataset, our method significantly enhances the average defense success rate by approximately 14.41%, reaching as high as 51.39%, in 17 unaligned pre-trained LLMs, without compromising performance.

PLFeb 26, 2022
Preventing Timing Side-Channels via Security-Aware Just-In-Time Compilation

Qi Qin, JulianAndres JiYang, Fu Song et al.

Recent work has shown that Just-In-Time (JIT) compilation can introduce timing side-channels to constant-time programs, which would otherwise be a principled and effective means to counter timing attacks. In this paper, we propose a novel approach to eliminate JIT-induced leaks from these programs. Specifically, we present an operational semantics and a formal definition of constant-time programs under JIT compilation, laying the foundation for reasoning about programs with JIT compilation. We then propose to eliminate JIT-induced leaks via a fine-grained JIT compilation for which we provide an automated approach to generate policies and a novel type system to show its soundness. We develop a tool DeJITLeak for Java based on our approach and implement the fine-grained JIT compilation in HotSpot. Experimental results show that DeJITLeak can effectively and efficiently eliminate JIT-induced leaks on three datasets used in side-channel detection

CRMay 2, 2021
BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning

Lun Wang, Zaynah Javed, Xian Wu et al.

Recent research has confirmed the feasibility of backdoor attacks in deep reinforcement learning (RL) systems. However, the existing attacks require the ability to arbitrarily modify an agent's observation, constraining the application scope to simple RL systems such as Atari games. In this paper, we migrate backdoor attacks to more complex RL systems involving multiple agents and explore the possibility of triggering the backdoor without directly manipulating the agent's observation. As a proof of concept, we demonstrate that an adversary agent can trigger the backdoor of the victim agent with its own action in two-player competitive RL systems. We prototype and evaluate BACKDOORL in four competitive environments. The results show that when the backdoor is activated, the winning rate of the victim drops by 17% to 37% compared to when not activated.

LGFeb 3, 2020
DANCE: Enhancing saliency maps using decoys

Yang Lu, Wenbo Guo, Xinyu Xing et al.

Saliency methods can make deep neural network predictions more interpretable by identifying a set of critical features in an input sample, such as pixels that contribute most strongly to a prediction made by an image classifier. Unfortunately, recent evidence suggests that many saliency methods poorly perform, especially in situations where gradients are saturated, inputs contain adversarial perturbations, or predictions rely upon inter-feature dependence. To address these issues, we propose a framework that improves the robustness of saliency methods by following a two-step procedure. First, we introduce a perturbation mechanism that subtly varies the input sample without changing its intermediate representations. Using this approach, we can gather a corpus of perturbed data samples while ensuring that the perturbed and original input samples follow the same distribution. Second, we compute saliency maps for the perturbed samples and propose a new method to aggregate saliency maps. With this design, we offset the gradient saturation influence upon interpretation. From a theoretical perspective, we show the aggregated saliency map could not only capture inter-feature dependence but, more importantly, robustify interpretation against previously described adversarial perturbation methods. Following our theoretical analysis, we present experimental results suggesting that, both qualitatively and quantitatively, our saliency method outperforms existing methods.

CRAug 2, 2019
TABOR: A Highly Accurate Approach to Inspecting and Restoring Trojan Backdoors in AI Systems

Wenbo Guo, Lun Wang, Xinyu Xing et al.

A trojan backdoor is a hidden pattern typically implanted in a deep neural network. It could be activated and thus forces that infected model behaving abnormally only when an input data sample with a particular trigger present is fed to that model. As such, given a deep neural network model and clean input samples, it is very challenging to inspect and determine the existence of a trojan backdoor. Recently, researchers design and develop several pioneering solutions to address this acute problem. They demonstrate the proposed techniques have a great potential in trojan detection. However, we show that none of these existing techniques completely address the problem. On the one hand, they mostly work under an unrealistic assumption (e.g. assuming availability of the contaminated training database). On the other hand, the proposed techniques cannot accurately detect the existence of trojan backdoors, nor restore high-fidelity trojan backdoor images, especially when the triggers pertaining to the trojan vary in size, shape and position. In this work, we propose TABOR, a new trojan detection technique. Conceptually, it formalizes a trojan detection task as a non-convex optimization problem, and the detection of a trojan backdoor as the task of resolving the optimization through an objective function. Different from the existing technique also modeling trojan detection as an optimization problem, TABOR designs a new objective function--under the guidance of explainable AI techniques as well as heuristics--that could guide optimization to identify a trojan backdoor in a more effective fashion. In addition, TABOR defines a new metric to measure the quality of a trojan backdoor identified. Using an anomaly detection method, we show the new metric could better facilitate TABOR to identify intentionally injected triggers in an infected model and filter out false alarms......

SEMay 25, 2019
PTrix: Efficient Hardware-Assisted Fuzzing for COTS Binary

Yaohui Chen, Dongliang Mu, Jun Xu et al.

Despite its effectiveness in uncovering software defects, American Fuzzy Lop (AFL), one of the best grey-box fuzzers, is inefficient when fuzz-testing source-unavailable programs. AFL's binary-only fuzzing mode, QEMU-AFL, is typically 2-5X slower than its source-available fuzzing mode. The slowdown is largely caused by the heavy dynamic instrumentation. Recent fuzzing techniques use Intel Processor Tracing (PT), a light-weight tracing feature supported by recent Intel CPUs, to remove the need of dynamic instrumentation. However, we found that these PT-based fuzzing techniques are even slower than QEMU-AFL when fuzzing real-world programs, making them less effective than QEMU-AFL. This poor performance is caused by the slow extraction of code coverage information from highly compressed PT traces. In this work, we present the design and implementation of PTrix, which fully unleashes the benefits of PT for fuzzing via three novel techniques. First, PTrix introduces a scheme to highly parallel the processing of PT trace and target program execution. Second, it directly takes decoded PT trace as feedback for fuzzing, avoiding the expensive reconstruction of code coverage information. Third, PTrix maintains the new feedback with stronger feedback than edge-based code coverage, which helps reach new code space and defects that AFL may not. We evaluated PTrix by comparing its performance with the state-of-the-art fuzzers. Our results show that, given the same amount of time, PTrix achieves a significantly higher fuzzing speed and reaches into code regions missed by the other fuzzers. In addition, PTrix identifies 35 new vulnerabilities in a set of previously well-fuzzed binaries, showing its ability to complement existing fuzzers.

LGNov 7, 2018
Explaining Deep Learning Models - A Bayesian Non-parametric Approach

Wenbo Guo, Sui Huang, Yunzhe Tao et al.

Understanding and interpreting how machine learning (ML) models make decisions have been a big challenge. While recent research has proposed various technical approaches to provide some clues as to how an ML model makes individual predictions, they cannot provide users with an ability to inspect a model as a complete entity. In this work, we propose a novel technical approach that augments a Bayesian non-parametric regression mixture model with multiple elastic nets. Using the enhanced mixture model, we can extract generalizable insights for a target model through a global approximation. To demonstrate the utility of our approach, we evaluate it on different ML models in the context of image recognition. The empirical results indicate that our proposed approach not only outperforms the state-of-the-art techniques in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of the target ML models.

SEOct 25, 2018
Fine-Grained Library Customization

Linhai Song, Xinyu Xing

Code bloat widely exists in production-run software. Left untackled, it not only degrades software performance but also increases its attack surface. In this work, we conduct a case study to understand this issue in statically linked libraries. To be specific, we analyze midilib, a software package enclosing statically linked libraries. We show that it is possible to leverage dependence analysis to trim the resultless code statements re- siding in a target library. With this observation, we believe it is possible to build a tool to automatically cut off code pertaining to resultless operations.

LGJan 16, 2018
A Comparative Study of Rule Extraction for Recurrent Neural Networks

Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia et al.

Understanding recurrent networks through rule extraction has a long history. This has taken on new interests due to the need for interpreting or verifying neural networks. One basic form for representing stateful rules is deterministic finite automata (DFA). Previous research shows that extracting DFAs from trained second-order recurrent networks is not only possible but also relatively stable. Recently, several new types of recurrent networks with more complicated architectures have been introduced. These handle challenging learning tasks usually involving sequential data. However, it remains an open problem whether DFAs can be adequately extracted from these models. Specifically, it is not clear how DFA extraction will be affected when applied to different recurrent networks trained on data sets with different levels of complexity. Here, we investigate DFA extraction on several widely adopted recurrent networks that are trained to learn a set of seven regular Tomita grammars. We first formally analyze the complexity of Tomita grammars and categorize these grammars according to that complexity. Then we empirically evaluate different recurrent networks for their performance of DFA extraction on all Tomita grammars. Our experiments show that for most recurrent networks, their extraction performance decreases as the complexity of the underlying grammar increases. On grammars of lower complexity, most recurrent networks obtain desirable extraction performance. As for grammars with the highest level of complexity, while several complicated models fail with only certain recurrent networks having satisfactory extraction performance.

LGSep 29, 2017
An Empirical Evaluation of Rule Extraction from Recurrent Neural Networks

Qinglong Wang, Kaixuan Zhang, Alexander G. Ororbia et al.

Rule extraction from black-box models is critical in domains that require model validation before implementation, as can be the case in credit scoring and medical diagnosis. Though already a challenging problem in statistical learning in general, the difficulty is even greater when highly non-linear, recursive models, such as recurrent neural networks (RNNs), are fit to data. Here, we study the extraction of rules from second-order recurrent neural networks trained to recognize the Tomita grammars. We show that production rules can be stably extracted from trained RNNs and that in certain cases the rules outperform the trained RNNs.

LGMay 23, 2017
Towards Interrogating Discriminative Machine Learning Models

Wenbo Guo, Kaixuan Zhang, Lin Lin et al.

It is oftentimes impossible to understand how machine learning models reach a decision. While recent research has proposed various technical approaches to provide some clues as to how a learning model makes individual decisions, they cannot provide users with ability to inspect a learning model as a complete entity. In this work, we propose a new technical approach that augments a Bayesian regression mixture model with multiple elastic nets. Using the enhanced mixture model, we extract explanations for a target model through global approximation. To demonstrate the utility of our approach, we evaluate it on different learning models covering the tasks of text mining and image recognition. Our results indicate that the proposed approach not only outperforms the state-of-the-art technique in explaining individual decisions but also provides users with an ability to discover the vulnerabilities of a learning model.

CRApr 19, 2017
TrustShadow: Secure Execution of Unmodified Applications with ARM TrustZone

Le Guan, Peng Liu, Xinyu Xing et al.

The rapid evolution of Internet-of-Things (IoT) technologies has led to an emerging need to make it smarter. A variety of applications now run simultaneously on an ARM-based processor. For example, devices on the edge of the Internet are provided with higher horsepower to be entrusted with storing, processing and analyzing data collected from IoT devices. This significantly improves efficiency and reduces the amount of data that needs to be transported to the cloud for data processing, analysis and storage. However, commodity OSes are prone to compromise. Once they are exploited, attackers can access the data on these devices. Since the data stored and processed on the devices can be sensitive, left untackled, this is particularly disconcerting. In this paper, we propose a new system, TrustShadow that shields legacy applications from untrusted OSes. TrustShadow takes advantage of ARM TrustZone technology and partitions resources into the secure and normal worlds. In the secure world, TrustShadow constructs a trusted execution environment for security-critical applications. This trusted environment is maintained by a lightweight runtime system that coordinates the communication between applications and the ordinary OS running in the normal world. The runtime system does not provide system services itself. Rather, it forwards requests for system services to the ordinary OS, and verifies the correctness of the responses. To demonstrate the efficiency of this design, we prototyped TrustShadow on a real chip board with ARM TrustZone support, and evaluated its performance using both microbenchmarks and real-world applications. We showed TrustShadow introduces only negligible overhead to real-world applications.

LGDec 5, 2016
Learning Adversary-Resistant Deep Neural Networks

Qinglong Wang, Wenbo Guo, Kaixuan Zhang et al.

Deep neural networks (DNNs) have proven to be quite effective in a vast array of machine learning tasks, with recent examples in cyber security and autonomous vehicles. Despite the superior performance of DNNs in these applications, it has been recently shown that these models are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points in order to "fool" the original DNN model, forcing it to mis-classify previously correctly classified samples with high confidence. Addressing this flaw in the model is essential if DNNs are to be used in critical applications such as those in cyber security. Previous work has provided various learning algorithms to enhance the robustness of DNN models, and they all fall into the tactic of "security through obscurity". This means security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate this issue shared across previous research work and propose a generic approach to escalate a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available. Our results indicate that our approach typically provides superior classification performance and resistance in comparison with state-of-art solutions.

CRNov 2, 2016
Context-aware System Service Call-oriented Symbolic Execution of Android Framework with Application to Exploit Generation

Lannan Luo, Qiang Zeng, Chen Cao et al.

Android Framework is a layer of software that exists in every Android system managing resources of all Android apps. A vulnerability in Android Framework can lead to severe hacks, such as destroying user data and leaking private information. With tens of millions of Android devices unpatched due to Android fragmentation, vulnerabilities in Android Framework certainly attract attackers to exploit them. So far, enormous manual effort is needed to craft such exploits. To our knowledge, no research has been done on automatic generation of exploits that take advantage of Android Framework vulnerabilities. We make a first step towards this goal by applying symbolic execution of Android Framework to finding bugs and generating exploits. Several challenges have been raised by the task. (1) The information of an app flows to Android Framework in multiple intricate steps, making it difficult to identify symbolic inputs. (2) Android Framework has a complex initialization phase, which exacerbates the state space explosion problem. (3) A straightforward design that builds the symbolic executor as a layer inside the Android system will not work well: not only does the implementation have to ensure the compatibility with the Android system, but it needs to be maintained whenever Android gets updated. We present novel ideas and techniques to resolve the challenges, and have built the first system for symbolic execution of Android Framework. It fundamentally changes the state of the art in exploit generation on the Android system, and has been applied to constructing new techniques for finding vulnerabilities.

LGOct 6, 2016
Using Non-invertible Data Transformations to Build Adversarial-Robust Neural Networks

Qinglong Wang, Wenbo Guo, Alexander G. Ororbia et al.

Deep neural networks have proven to be quite effective in a wide variety of machine learning tasks, ranging from improved speech recognition systems to advancing the development of autonomous vehicles. However, despite their superior performance in many applications, these models have been recently shown to be susceptible to a particular type of attack possible through the generation of particular synthetic examples referred to as adversarial samples. These samples are constructed by manipulating real examples from the training data distribution in order to "fool" the original neural model, resulting in misclassification (with high confidence) of previously correctly classified samples. Addressing this weakness is of utmost importance if deep neural architectures are to be applied to critical applications, such as those in the domain of cybersecurity. In this paper, we present an analysis of this fundamental flaw lurking in all neural architectures to uncover limitations of previously proposed defense mechanisms. More importantly, we present a unifying framework for protecting deep neural models using a non-invertible data transformation--developing two adversary-resilient architectures utilizing both linear and nonlinear dimensionality reduction. Empirical results indicate that our framework provides better robustness compared to state-of-art solutions while having negligible degradation in accuracy.

LGOct 5, 2016
Adversary Resistant Deep Neural Networks with an Application to Malware Detection

Qinglong Wang, Wenbo Guo, Kaixuan Zhang et al.

Beyond its highly publicized victories in Go, there have been numerous successful applications of deep learning in information retrieval, computer vision and speech recognition. In cybersecurity, an increasing number of companies have become excited about the potential of deep learning, and have started to use it for various security incidents, the most popular being malware detection. These companies assert that deep learning (DL) could help turn the tide in the battle against malware infections. However, deep neural networks (DNNs) are vulnerable to adversarial samples, a flaw that plagues most if not all statistical learning models. Recent research has demonstrated that those with malicious intent can easily circumvent deep learning-powered malware detection by exploiting this flaw. In order to address this problem, previous work has developed various defense mechanisms that either augmenting training data or enhance model's complexity. However, after a thorough analysis of the fundamental flaw in DNNs, we discover that the effectiveness of current defenses is limited and, more importantly, cannot provide theoretical guarantees as to their robustness against adversarial sampled-based attacks. As such, we propose a new adversary resistant technique that obstructs attackers from constructing impactful adversarial samples by randomly nullifying features within samples. In this work, we evaluate our proposed technique against a real world dataset with 14,679 malware variants and 17,399 benign programs. We theoretically validate the robustness of our technique, and empirically show that our technique significantly boosts DNN robustness to adversarial samples while maintaining high accuracy in classification. To demonstrate the general applicability of our proposed method, we also conduct experiments using the MNIST and CIFAR-10 datasets, generally used in image recognition research.

CRSep 8, 2016
From Physical to Cyber: Escalating Protection for Personalized Auto Insurance

Le Guan, Jun Xu, Shuai Wang et al.

Nowadays, auto insurance companies set personalized insurance rate based on data gathered directly from their customers' cars. In this paper, we show such a personalized insurance mechanism -- wildly adopted by many auto insurance companies -- is vulnerable to exploit. In particular, we demonstrate that an adversary can leverage off-the-shelf hardware to manipulate the data to the device that collects drivers' habits for insurance rate customization and obtain a fraudulent insurance discount. In response to this type of attack, we also propose a defense mechanism that escalates the protection for insurers' data collection. The main idea of this mechanism is to augment the insurer's data collection device with the ability to gather unforgeable data acquired from the physical world, and then leverage these data to identify manipulated data points. Our defense mechanism leveraged a statistical model built on unmanipulated data and is robust to manipulation methods that are not foreseen previously. We have implemented this defense mechanism as a proof-of-concept prototype and tested its effectiveness in the real world. Our evaluation shows that our defense mechanism exhibits a false positive rate of 0.032 and a false negative rate of 0.013.

CRApr 12, 2012
An Empirical Study of Spam and Prevention Mechanisms in Online Video Chat Services

Xinyu Xing, Junho Ahn, Wenke Lee et al.

Recently, online video chat services are becoming increasingly popular. While experiencing tremendous growth, online video chat services have also become yet another spamming target. Unlike spam propagated via traditional medium like emails and social networks, we find that spam propagated via online video chat services is able to draw much larger attention from the users. We have conducted several experiments to investigate spam propagation on Chatroulette - the largest online video chat website. We have found that the largest spam campaign on online video chat websites is dating scams. Our study indicates that spam carrying dating or pharmacy scams have much higher clickthrough rates than email spam carrying the same content. In particular, dating scams reach a clickthrough rate of 14.97%. We also examined and analysed spam prevention mechanisms that online video chat websites have designed and implemented. Our study indicates that the prevention mechanisms either harm legitimate user experience or can be easily bypassed.