Shanchieh Jay Yang

CR
h-index24
14papers
91citations
Novelty39%
AI Score36

14 Papers

AIJun 24, 2023
On the Uses of Large Language Models to Interpret Ambiguous Cyberattack Descriptions

Reza Fayyazi, Shanchieh Jay Yang

The volume, variety, and velocity of change in vulnerabilities and exploits have made incident threat analysis challenging with human expertise and experience along. Tactics, Techniques, and Procedures (TTPs) are to describe how and why attackers exploit vulnerabilities. However, a TTP description written by one security professional can be interpreted very differently by another, leading to confusion in cybersecurity operations or even business, policy, and legal decisions. Meanwhile, advancements in AI have led to the increasing use of Natural Language Processing (NLP) algorithms to assist the various tasks in cyber operations. With the rise of Large Language Models (LLMs), NLP tasks have significantly improved because of the LLM's semantic understanding and scalability. This leads us to question how well LLMs can interpret TTPs or general cyberattack descriptions to inform analysts of the intended purposes of cyberattacks. We propose to analyze and compare the direct use of LLMs (e.g., GPT-3.5) versus supervised fine-tuning (SFT) of small-scale-LLMs (e.g., BERT) to study their capabilities in predicting ATT&CK tactics. Our results reveal that the small-scale-LLMs with SFT provide a more focused and clearer differentiation between the ATT&CK tactics (if such differentiation exists). On the other hand, direct use of LLMs offer a broader interpretation of cyberattack techniques. When treating more general cases, despite the power of LLMs, inherent ambiguity exists and limits their predictive power. We then summarize the challenges and recommend research directions on LLMs to treat the inherent ambiguity of TTP descriptions used in various cyber operations.

LGAug 28, 2023
Are Existing Out-Of-Distribution Techniques Suitable for Network Intrusion Detection?

Andrea Corsini, Shanchieh Jay Yang

Machine learning (ML) has become increasingly popular in network intrusion detection. However, ML-based solutions always respond regardless of whether the input data reflects known patterns, a common issue across safety-critical applications. While several proposals exist for detecting Out-Of-Distribution (OOD) in other fields, it remains unclear whether these approaches can effectively identify new forms of intrusions for network security. New attacks, not necessarily affecting overall distributions, are not guaranteed to be clearly OOD as instead, images depicting new classes are in computer vision. In this work, we investigate whether existing OOD detectors from other fields allow the identification of unknown malicious traffic. We also explore whether more discriminative and semantically richer embedding spaces within models, such as those created with contrastive learning and multi-class tasks, benefit detection. Our investigation covers a set of six OOD techniques that employ different detection strategies. These techniques are applied to models trained in various ways and subsequently exposed to unknown malicious traffic from the same and different datasets (network environments). Our findings suggest that existing detectors can identify a consistent portion of new malicious traffic, and that improved embedding spaces enhance detection. We also demonstrate that simple combinations of certain detectors can identify almost 100% of malicious traffic in our tested scenarios.

CRDec 28, 2022
HeATed Alert Triage (HeAT): Transferrable Learning to Extract Multistage Attack Campaigns

Stephen Moskal, Shanchieh Jay Yang

With growing sophistication and volume of cyber attacks combined with complex network structures, it is becoming extremely difficult for security analysts to corroborate evidences to identify multistage campaigns on their network. This work develops HeAT (Heated Alert Triage): given a critical indicator of compromise (IoC), e.g., a severe IDS alert, HeAT produces a HeATed Attack Campaign (HAC) depicting the multistage activities that led up to the critical event. We define the concept of "Alert Episode Heat" to represent the analysts opinion of how much an event contributes to the attack campaign of the critical IoC given their knowledge of the network and security expertise. Leveraging a network-agnostic feature set, HeAT learns the essence of analyst's assessment of "HeAT" for a small set of IoC's, and applies the learned model to extract insightful attack campaigns for IoC's not seen before, even across networks by transferring what have been learned. We demonstrate the capabilities of HeAT with data collected in Collegiate Penetration Testing Competition (CPTC) and through collaboration with a real-world SOC. We developed HeAT-Gain metrics to demonstrate how analysts may assess and benefit from the extracted attack campaigns in comparison to common practices where IP addresses are used to corroborate evidences. Our results demonstrates the practical uses of HeAT by finding campaigns that span across diverse attack stages, remove a significant volume of irrelevant alerts, and achieve coherency to the analyst's original assessments.

CRDec 30, 2023
Advancing TTP Analysis: Harnessing the Power of Large Language Models with Retrieval Augmented Generation

Reza Fayyazi, Rozhina Taghdimi, Shanchieh Jay Yang

Tactics, Techniques, and Procedures (TTPs) outline the methods attackers use to exploit vulnerabilities. The interpretation of TTPs in the MITRE ATT&CK framework can be challenging for cybersecurity practitioners due to presumed expertise and complex dependencies. Meanwhile, advancements with Large Language Models (LLMs) have led to recent surge in studies exploring its uses in cybersecurity operations. It is, however, unclear how LLMs can be used in an efficient and proper way to provide accurate responses for critical domains such as cybersecurity. This leads us to investigate how to better use two types of LLMs: small-scale encoder-only (e.g., RoBERTa) and larger decoder-only (e.g., GPT-3.5) LLMs to comprehend and summarize TTPs with the intended purposes (i.e., tactics) of a cyberattack procedure. This work studies and compares the uses of supervised fine-tuning (SFT) of encoder-only LLMs vs. Retrieval Augmented Generation (RAG) for decoder-only LLMs (without fine-tuning). Both SFT and RAG techniques presumably enhance the LLMs with relevant contexts for each cyberattack procedure. Our studies show decoder-only LLMs with RAG achieves better performance than encoder-only models with SFT, particularly when directly relevant context is extracted by RAG. The decoder-only results could suffer low `Precision' while achieving high `Recall'. Our findings further highlight a counter-intuitive observation that more generic prompts tend to yield better predictions of cyberattack tactics than those that are more specifically tailored.

CRSep 9, 2025
Guided Reasoning in LLM-Driven Penetration Testing Using Structured Attack Trees

Katsuaki Nakano, Reza Fayyazi, Shanchieh Jay Yang et al.

Recent advances in Large Language Models (LLMs) have driven interest in automating cybersecurity penetration testing workflows, offering the promise of faster and more consistent vulnerability assessment for enterprise systems. Existing LLM agents for penetration testing primarily rely on self-guided reasoning, which can produce inaccurate or hallucinated procedural steps. As a result, the LLM agent may undertake unproductive actions, such as exploiting unused software libraries or generating cyclical responses that repeat prior tactics. In this work, we propose a guided reasoning pipeline for penetration testing LLM agents that incorporates a deterministic task tree built from the MITRE ATT&CK Matrix, a proven penetration testing kll chain, to constrain the LLM's reaoning process to explicitly defined tactics, techniques, and procedures. This anchors reasoning in proven penetration testing methodologies and filters out ineffective actions by guiding the agent towards more productive attack procedures. To evaluate our approach, we built an automated penetration testing LLM agent using three LLMs (Llama-3-8B, Gemini-1.5, and GPT-4) and applied it to navigate 10 HackTheBox cybersecurity exercises with 103 discrete subtasks representing real-world cyberattack scenarios. Our proposed reasoning pipeline guided the LLM agent through 71.8\%, 72.8\%, and 78.6\% of subtasks using Llama-3-8B, Gemini-1.5, and GPT-4, respectively. Comparatively, the state-of-the-art LLM penetration testing tool using self-guided reasoning completed only 13.5\%, 16.5\%, and 75.7\% of subtasks and required 86.2\%, 118.7\%, and 205.9\% more model queries. This suggests that incorporating a deterministic task tree into LLM reasoning pipelines can enhance the accuracy and efficiency of automated cybersecurity assessments

CRJun 12, 2025
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis

Reza Fayyazi, Michael Zuzak, Shanchieh Jay Yang

Large Language Models (LLMs) are increasingly used for cybersecurity threat analysis, but their deployment in security-sensitive environments raises trust and safety concerns. With over 21,000 vulnerabilities disclosed in 2025, manual analysis is infeasible, making scalable and verifiable AI support critical. When querying LLMs, dealing with emerging vulnerabilities is challenging as they have a training cut-off date. While Retrieval-Augmented Generation (RAG) can inject up-to-date context to alleviate the cut-off date limitation, it remains unclear how much LLMs rely on retrieved evidence versus the model's internal knowledge, and whether the retrieved information is meaningful or even correct. This uncertainty could mislead security analysts, mis-prioritize patches, and increase security risks. Therefore, this work proposes LLM Embedding-based Attribution (LEA) to analyze the generated responses for vulnerability exploitation analysis. More specifically, LEA quantifies the relative contribution of internal knowledge vs. retrieved content in the generated responses. We evaluate LEA on 500 critical vulnerabilities disclosed between 2016 and 2025, across three RAG settings -- valid, generic, and incorrect -- using three state-of-the-art LLMs. Our results demonstrate LEA's ability to detect clear distinctions between non-retrieval, generic-retrieval, and valid-retrieval scenarios with over 95% accuracy on larger models. Finally, we demonstrate the limitations posed by incorrect retrieval of vulnerability information and raise a cautionary note to the cybersecurity community regarding the blind reliance on LLMs and RAG for vulnerability analysis. LEA offers security analysts with a metric to audit RAG-enhanced workflows, improving the transparent and trustworthy deployment of AI in cybersecurity threat analysis.

CLJun 25, 2024
Evaluating the Efficacy of Foundational Models: Advancing Benchmarking Practices to Enhance Fine-Tuning Decision-Making

Oluyemi Enoch Amujo, Shanchieh Jay Yang

Recently, large language models (LLMs) have expanded into various domains. However, there remains a need to evaluate how these models perform when prompted with commonplace queries compared to domain-specific queries, which may be useful for benchmarking prior to fine-tuning for domain-specific downstream tasks. This study evaluates LLMs, specifically Gemma-2B and Gemma-7B, across diverse domains, including cybersecurity, medicine, and finance, compared to common knowledge queries. This study utilizes a comprehensive methodology to assess foundational models, which includes problem formulation, data analysis, and the development of ThroughCut, a novel outlier detection technique that automatically identifies response throughput outliers based on their conciseness. This methodological rigor enhances the credibility of the presented evaluation frameworks. This study focused on assessing inference time, response length, throughput, quality, and resource utilization and investigated the correlations between these factors. The results indicate that model size and types of prompts used for inference significantly influenced response length and quality. In addition, common prompts, which include various types of queries, generate diverse and inconsistent responses at irregular intervals. In contrast, domain-specific prompts consistently generate concise responses within a reasonable time. Overall, this study underscores the need for comprehensive evaluation frameworks to enhance the reliability of benchmarking procedures in multidomain AI research.

CRJul 6, 2021
SAGE: Intrusion Alert-driven Attack Graph Extractor

Azqa Nadeem, Sicco Verwer, Shanchieh Jay Yang

Attack graphs (AG) are used to assess pathways availed by cyber adversaries to penetrate a network. State-of-the-art approaches for AG generation focus mostly on deriving dependencies between system vulnerabilities based on network scans and expert knowledge. In real-world operations however, it is costly and ineffective to rely on constant vulnerability scanning and expert-crafted AGs. We propose to automatically learn AGs based on actions observed through intrusion alerts, without prior expert knowledge. Specifically, we develop an unsupervised sequence learning system, SAGE, that leverages the temporal and probabilistic dependence between alerts in a suffix-based probabilistic deterministic finite automaton (S-PDFA) -- a model that accentuates infrequent severe alerts and summarizes paths leading to them. AGs are then derived from the S-PDFA on a per-objective, per-victim basis. Tested with intrusion alerts collected through Collegiate Penetration Testing Competition, SAGE compresses over 330k alerts into 93 AGs. These AGs reflect the strategies used by the participating teams. The AGs are succinct, interpretable, and capture behavioral dynamics, e.g., that attackers will often follow shorter paths to re-exploit objectives.

CRJun 15, 2021
On the Evaluation of Sequential Machine Learning for Network Intrusion Detection

Andrea Corsini, Shanchieh Jay Yang, Giovanni Apruzzese

Recent advances in deep learning renewed the research interests in machine learning for Network Intrusion Detection Systems (NIDS). Specifically, attention has been given to sequential learning models, due to their ability to extract the temporal characteristics of Network traffic Flows (NetFlows), and use them for NIDS tasks. However, the applications of these sequential models often consist of transferring and adapting methodologies directly from other fields, without an in-depth investigation on how to leverage the specific circumstances of cybersecurity scenarios; moreover, there is a lack of comprehensive studies on sequential models that rely on NetFlow data, which presents significant advantages over traditional full packet captures. We tackle this problem in this paper. We propose a detailed methodology to extract temporal sequences of NetFlows that denote patterns of malicious activities. Then, we apply this methodology to compare the efficacy of sequential learning models against traditional static learning models. In particular, we perform a fair comparison of a `sequential' Long Short-Term Memory (LSTM) against a `static' Feedforward Neural Networks (FNN) in distinct environments represented by two well-known datasets for NIDS: the CICIDS2017 and the CTU13. Our results highlight that LSTM achieves comparable performance to FNN in the CICIDS2017 with over 99.5\% F1-score; while obtaining superior performance in the CTU13, with 95.7\% F1-score against 91.5\%. This paper thus paves the way to future applications of sequential learning models for NIDS.

CRMar 25, 2021
Near Real-time Learning and Extraction of Attack Models from Intrusion Alerts

Shanchieh Jay Yang, Ahmet Okutan, Gordon Werner et al.

Critical and sophisticated cyberattacks often take multitudes of reconnaissance, exploitations, and obfuscation techniques to penetrate through well protected enterprise networks. The discovery and detection of attacks, though needing continuous efforts, is no longer sufficient. Security Operation Center (SOC) analysts are overwhelmed by the significant volume of intrusion alerts without being able to extract actionable intelligence. Recognizing this challenge, this paper describes the advances and findings through deploying ASSERT to process intrusion alerts from OmniSOC in collaboration with the Center for Applied Cybersecurity Research (CACR) at Indiana University. ASSERT utilizes information theoretic unsupervised learning to extract and update `attack models' in near real-time without expert knowledge. It consumes streaming intrusion alerts and generates a small number of statistical models for SOC analysts to comprehend ongoing and emerging attacks in a timely manner. This paper presents the architecture and key processes of ASSERT and discusses a few real-world attack models to highlight the use-cases that benefit SOC operations. The research team is developing a light-weight containerized ASSERT that will be shared through a public repository to help the community combat the overwhelming intrusion alerts.

CRFeb 18, 2020
Cyberattack Action-Intent-Framework for Mapping Intrusion Observables

Stephen Moskal, Shanchieh Jay Yang

The techniques and tactics used by cyber adversaries are becoming more sophisticated, ironically, as defense getting stronger and the cost of a breach continuing to rise. Understanding the thought processes and behaviors of adversaries is extremely challenging as high profile or even amateur attackers have no incentive to share the trades associated with their illegal activities. One opportunity to observe the actions the adversaries perform is through the use of Intrusion Detection Systems (IDS) which generate alerts in the event that suspicious behavior was detected. The alerts raised by these systems typically describe the suspicious actions via the form of attack 'signature', which do not necessarily reveal the true intent of the attacker performing the action. Meanwhile, several high level frameworks exist to describe the sequence or chain of action types an adversary might perform. These frameworks, however, do not connect the action types to observables of standard intrusion detection systems, nor describing the plausible intents of the adversarial actions. To address these gaps, this work proposes the Action-Intent Framework (AIF) to complement existing Cyber Attack Kill Chains and Attack Taxonomies. The AIF defines a set of Action-Intent States (AIS) at two levels of description: the Macro-AIS describes 'what' the attacker is trying to achieve and the Micro-AIS describes "how" the intended goal is achieved. A full description of both the Macro is provided along with a set of guiding principals of how the AIS is derived and added to the framework.

LGAug 3, 2019
On the Veracity of Cyber Intrusion Alerts Synthesized by Generative Adversarial Networks

Christopher Sweet, Stephen Moskal, Shanchieh Jay Yang

Recreating cyber-attack alert data with a high level of fidelity is challenging due to the intricate interaction between features, non-homogeneity of alerts, and potential for rare yet critical samples. Generative Adversarial Networks (GANs) have been shown to effectively learn complex data distributions with the intent of creating increasingly realistic data. This paper presents the application of GANs to cyber-attack alert data and shows that GANs not only successfully learn to generate realistic alerts, but also reveal feature dependencies within alerts. This is accomplished by reviewing the intersection of histograms for varying alert-feature combinations between the ground truth and generated datsets. Traditional statistical metrics, such as conditional and joint entropy, are also employed to verify the accuracy of these dependencies. Finally, it is shown that a Mutual Information constraint on the network can be used to increase the generation of low probability, critical, alert values. By mapping alerts to a set of attack stages it is shown that the output of these low probability alerts has a direct contextual meaning for Cyber Security analysts. Overall, this work provides the basis for generating new cyber intrusion alerts and provides evidence that synthesized alerts emulate critical dependencies from the source dataset.

CRSep 5, 2018
Probabilistic Modeling and Inference for Obfuscated Cyber Attack Sequences

Haitao Du, Shanchieh Jay Yang

A key element in defending computer networks is to recognize the types of cyber attacks based on the observed malicious activities. Obfuscation onto what could have been observed of an attack sequence may lead to mis-interpretation of its effect and intent, leading to ineffective defense or recovery deployments. This work develops probabilistic graphical models to generalize a few obfuscation techniques and to enable analyses of the Expected Classification Accuracy (ECA) as a result of these different obfuscation on various attack models. Determining the ECA is a NP-Hard problem due to the combinatorial number of possibilities. This paper presents several polynomial-time algorithms to find the theoretically bounded approximation of ECA under different attack obfuscation models. Comprehensive simulation shows the impact on ECA due to alteration, insertion and removal of attack action sequence, with increasing observation length, level of obfuscation and model complexity.

CRMar 26, 2018
Forecasting Cyber Attacks with Imbalanced Data Sets and Different Time Granularities

Ahmet Okutan, Shanchieh Jay Yang, Katie McConky

If cyber incidents are predicted a reasonable amount of time before they occur, defensive actions to prevent their destructive effects could be planned. Unfortunately, most of the time we do not have enough observables of the malicious activities before they are already under way. Therefore, this work suggests to use unconventional signals extracted from various data sources with different time granularities to predict cyber incidents for target entities. A Bayesian network is used to predict cyber attacks where the unconventional signals are used as indicative random variables. This work also develops a novel minority class over sampling technique to improve cyber attack prediction on imbalanced data sets. The results show that depending on the selected time granularity, the unconventional signals are able to predict cyber attacks for the anonimyzed target organization even though the signals are not explicitly related to that organization. Furthermore, the minority over sampling approach developed achieves better performance compared to the existing filtering techniques in the literature.