Phan The Duy

CR
h-index13
13papers
88citations
Novelty50%
AI Score53

13 Papers

CRSep 15, 2023
VulnSense: Efficient Vulnerability Detection in Ethereum Smart Contracts by Multimodal Learning with Graph Neural Network and Language Model

Phan The Duy, Nghi Hoang Khoa, Nguyen Huu Quyen et al.

This paper presents VulnSense framework, a comprehensive approach to efficiently detect vulnerabilities in Ethereum smart contracts using a multimodal learning approach on graph-based and natural language processing (NLP) models. Our proposed framework combines three types of features from smart contracts comprising source code, opcode sequences, and control flow graph (CFG) extracted from bytecode. We employ Bidirectional Encoder Representations from Transformers (BERT), Bidirectional Long Short-Term Memory (BiLSTM) and Graph Neural Network (GNN) models to extract and analyze these features. The final layer of our multimodal approach consists of a fully connected layer used to predict vulnerabilities in Ethereum smart contracts. Addressing limitations of existing vulnerability detection methods relying on single-feature or single-model deep learning techniques, our method surpasses accuracy and effectiveness constraints. We assess VulnSense using a collection of 1.769 smart contracts derived from the combination of three datasets: Curated, SolidiFI-Benchmark, and Smartbugs Wild. We then make a comparison with various unimodal and multimodal learning techniques contributed by GNN, BiLSTM and BERT architectures. The experimental outcomes demonstrate the superior performance of our proposed approach, achieving an average accuracy of 77.96\% across all three categories of vulnerable smart contracts.

CRSep 26, 2023
XGV-BERT: Leveraging Contextualized Language Model and Graph Neural Network for Efficient Software Vulnerability Detection

Vu Le Anh Quan, Chau Thuan Phat, Kiet Van Nguyen et al.

With the advancement of deep learning (DL) in various fields, there are many attempts to reveal software vulnerabilities by data-driven approach. Nonetheless, such existing works lack the effective representation that can retain the non-sequential semantic characteristics and contextual relationship of source code attributes. Hence, in this work, we propose XGV-BERT, a framework that combines the pre-trained CodeBERT model and Graph Neural Network (GCN) to detect software vulnerabilities. By jointly training the CodeBERT and GCN modules within XGV-BERT, the proposed model leverages the advantages of large-scale pre-training, harnessing vast raw data, and transfer learning by learning representations for training data through graph convolution. The research results demonstrate that the XGV-BERT method significantly improves vulnerability detection accuracy compared to two existing methods such as VulDeePecker and SySeVR. For the VulDeePecker dataset, XGV-BERT achieves an impressive F1-score of 97.5%, significantly outperforming VulDeePecker, which achieved an F1-score of 78.3%. Again, with the SySeVR dataset, XGV-BERT achieves an F1-score of 95.5%, surpassing the results of SySeVR with an F1-score of 83.5%.

CRSep 27, 2023
Raijū: Reinforcement Learning-Guided Post-Exploitation for Automating Security Assessment of Network Systems

Van-Hau Pham, Hien Do Hoang, Phan Thanh Trung et al.

In order to assess the risks of a network system, it is important to investigate the behaviors of attackers after successful exploitation, which is called post-exploitation. Although there are various efficient tools supporting post-exploitation implementation, no application can automate this process. Most of the steps of this process are completed by experts who have profound knowledge of security, known as penetration testers or pen-testers. To this end, our study proposes the Raijū framework, a Reinforcement Learning (RL)-driven automation approach that assists pen-testers in quickly implementing the process of post-exploitation for security-level evaluation in network systems. We implement two RL algorithms, Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO), to train specialized agents capable of making intelligent actions, which are Metasploit modules to automatically launch attacks of privileges escalation, gathering hashdump, and lateral movement. By leveraging RL, we aim to empower these agents with the ability to autonomously select and execute actions that can exploit vulnerabilities in target systems. This approach allows us to automate certain aspects of the penetration testing workflow, making it more efficient and responsive to emerging threats and vulnerabilities. The experiments are performed in four real environments with agents trained in thousands of episodes. The agents automatically select actions and launch attacks on the environments and achieve over 84\% of successful attacks with under 55 attack steps given. Moreover, the A2C algorithm has proved extremely effective in the selection of proper actions for automation of post-exploitation.

CRSep 15, 2023
XFedHunter: An Explainable Federated Learning Framework for Advanced Persistent Threat Detection in SDN

Huynh Thai Thi, Ngo Duc Hoang Son, Phan The Duy et al.

Advanced Persistent Threat (APT) attacks are highly sophisticated and employ a multitude of advanced methods and techniques to target organizations and steal sensitive and confidential information. APT attacks consist of multiple stages and have a defined strategy, utilizing new and innovative techniques and technologies developed by hackers to evade security software monitoring. To effectively protect against APTs, detecting and predicting APT indicators with an explanation from Machine Learning (ML) prediction is crucial to reveal the characteristics of attackers lurking in the network system. Meanwhile, Federated Learning (FL) has emerged as a promising approach for building intelligent applications without compromising privacy. This is particularly important in cybersecurity, where sensitive data and high-quality labeling play a critical role in constructing effective machine learning models for detecting cyber threats. Therefore, this work proposes XFedHunter, an explainable federated learning framework for APT detection in Software-Defined Networking (SDN) leveraging local cyber threat knowledge from many training collaborators. In XFedHunter, Graph Neural Network (GNN) and Deep Learning model are utilized to reveal the malicious events effectively in the large number of normal ones in the network system. The experimental results on NF-ToN-IoT and DARPA TCE3 datasets indicate that our framework can enhance the trust and accountability of ML-based systems utilized for cybersecurity purposes without privacy leakage.

CRSep 25, 2023
On the Effectiveness of Adversarial Samples against Ensemble Learning-based Windows PE Malware Detectors

Trong-Nghia To, Danh Le Kim, Do Thi Thu Hien et al.

Recently, there has been a growing focus and interest in applying machine learning (ML) to the field of cybersecurity, particularly in malware detection and prevention. Several research works on malware analysis have been proposed, offering promising results for both academic and practical applications. In these works, the use of Generative Adversarial Networks (GANs) or Reinforcement Learning (RL) can aid malware creators in crafting metamorphic malware that evades antivirus software. In this study, we propose a mutation system to counteract ensemble learning-based detectors by combining GANs and an RL model, overcoming the limitations of the MalGAN model. Our proposed FeaGAN model is built based on MalGAN by incorporating an RL model called the Deep Q-network anti-malware Engines Attacking Framework (DQEAF). The RL model addresses three key challenges in performing adversarial attacks on Windows Portable Executable malware, including format preservation, executability preservation, and maliciousness preservation. In the FeaGAN model, ensemble learning is utilized to enhance the malware detector's evasion ability, with the generated adversarial patterns. The experimental results demonstrate that 100\% of the selected mutant samples preserve the format of executable files, while certain successes in both executability preservation and maliciousness preservation are achieved, reaching a stable success rate.

38.4CRMar 28
Red-MIRROR: Agentic LLM-based Autonomous Penetration Testing with Reflective Verification and Knowledge-augmented Interaction

Tran Vy Khang, Nguyen Dang Nguyen Khang, Nghi Hoang Khoa et al.

Web applications remain the dominant attack surface in cybersecurity, where vulnerabilities such as SQL injection, XSS, and business logic flaws continue to cause significant data breaches. While penetration testing is effective for identifying these weaknesses, traditional manual approaches are time-consuming and heavily dependent on scarce expert knowledge. Recent Large Language Models (LLM)-based multi-agent systems have shown promise in automating penetration testing, yet they still suffer from critical limitations: over-reliance on parametric knowledge, fragmented session memory, and insufficient validation of attack payloads and responses. This paper proposes Red-MIRROR, a novel multi-agent automated penetration testing system that introduces a tightly coupled memory-reflection backbone to explicitly govern inter-agent reasoning. By synthesizing Retrieval-Augmented Generation (RAG) for external knowledge augmentation, a Shared Recurrent Memory Mechanism (SRMM) for persistent state management, and a Dual-Phase Reflection mechanism for adaptive validation, Red-MIRROR provides a robust solution for complex web exploitation. Empirical evaluation on the XBOW benchmark and Vulhub CVEs shows that Red-MIRROR achieves performance comparable to state-of-the-art agents on Vulhub scenarios, while demonstrating a clear advantage on the XBOW benchmark. On the XBOW benchmark, Red-MIRROR attains an overall success rate of 86.0 percent, outperforming PentestAgent (50.0 percent), AutoPT (46.0 percent), and the VulnBot baseline (6.0 percent). Furthermore, the system achieves a 93.99 percent subtask completion rate, indicating strong long-horizon reasoning and payload refinement capability. Finally, we discuss ethical implications and propose safeguards to mitigate misuse risks.

42.3CRApr 8
PoC-Adapt: Semantic-Aware Automated Vulnerability Reproduction with LLM Multi-Agents and Reinforcement Learning-Driven Adaptive Policy

Phan The Duy, Nguyen Viet Duy, Khoa Ngo-Khanh et al.

While recent approaches leverage large language models (LLMs) and multi-agent pipelines to automatically generate proof-of-concept (PoC) exploits from vulnerability reports, existing systems often suffer from two fundamental limitations: unreliable validation based on surface-level execution signals and high operational cost caused by extensive trial-and-error during exploit generation. In this paper, we present PoC-Adapt, an end-to-end framework for automated PoC generation and verification, architected upon a foundation semantic runtime validation and adaptive policy learning. At the core of PoC-Adapt is a Semantic Oracle that validates exploits by comparing structured pre- and post-execution system states, enabling reliable distinction between true vulnerability exploitation and incidental behavioral changes. To reduce exploration cost, we further introduce an Adaptive Policy Learning mechanism that learns an exploitation policy over semantic states and actions, guiding the exploit agent toward effective strategies with fewer failed attempts. PoC-Adapt is implemented as a multi-agent system comprising specialized agents for root cause analysis, environment building, exploit generation, and semantic validation, coordinated through structured feedback loops. Experimenting on the CWE-Bench-Java and PrimeVul benchmarks shows that PoC-Adapt significantly improves verification reliability by 25% and reduces exploit generation cost compared to prior LLM-based systems, highlighting the importance of semantic validation and learned action policies in automated vulnerability reproduction. Applied to the latest CVE corpus, PoC-Adapt confirmed 12 verified PoC out of 80 reproduce attempts at a cost of $0.42 per generated exploit

26.8CRApr 3
ContractShield: Bridging Semantic-Structural Gaps via Hierarchical Cross-Modal Fusion for Multi-Label Vulnerability Detection in Obfuscated Smart Contracts

Minh-Dai Tran-Duong, Nguyen Hai Phong, Nguyen Chi Thanh et al.

Smart contracts are increasingly targeted by adversaries employing obfuscation techniques such as bogus code injection and control flow manipulation to evade vulnerability detection. Existing multimodal methods often process semantic, temporal, and structural features in isolation and fuse them using simple strategies such as concatenation, which neglects cross-modal interactions and weakens robustness, as obfuscation of a single modality can sharply degrade detection accuracy. To address these challenges, we propose ContractShield, a robust multimodal framework with a novel fusion mechanism that effectively correlates multiple complementary features through a three-level fusion. Self-attention first identifies patterns that indicate vulnerability within each feature space. Cross-modal attention then establishes meaningful connections between complementary signals across modalities. Then, adaptive weighting dynamically calibrates feature contributions based on their reliability under obfuscation. For feature extraction, ContractShield integrates (1) CodeBERT with a sliding window mechanism to capture semantic dependencies in source code, (2) Extended long short-term memory (xLSTM) to model temporal dynamics in opcode sequences, and (3) GATv2 to identify structural invariants in control flow graphs (CFGs) that remain stable across obfuscation. Empirical evaluation demonstrates resilience of ContractShield, achieving a 89 percentage Hamming Score with only a 1-3 percentage drop compared to non-obfuscated data. The framework simultaneously detects five major vulnerability types with 91 percentage F1-score, outperforming state-of-the-art approaches by 6-15 percentage under adversarial conditions.

46.2LGMar 30
ORACAL: A Robust and Explainable Multimodal Framework for Smart Contract Vulnerability Detection with Causal Graph Enrichment

Tran Duong Minh Dai, Triet Huynh Minh Le, M. Ali Babar et al.

Although Graph Neural Networks (GNNs) have shown promise for smart contract vulnerability detection, they still face significant limitations. Homogeneous graph models fail to capture the interplay between control flow and data dependencies, while heterogeneous graph approaches often lack deep semantic understanding, leaving them susceptible to adversarial attacks. Moreover, most black-box models fail to provide explainable evidence, hindering trust in professional audits. To address these challenges, we propose ORACAL (Observable RAG-enhanced Analysis with CausAL reasoning), a heterogeneous multimodal graph learning framework that integrates Control Flow Graph (CFG), Data Flow Graph (DFG), and Call Graph (CG). ORACAL selectively enriches critical subgraphs with expert-level security context from Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), and employs a causal attention mechanism to disentangle true vulnerability indicators from spurious correlations. For transparency, the framework adopts PGExplainer to generate subgraph-level explanations identifying vulnerability triggering paths. Experiments on large-scale datasets demonstrate that ORACAL achieves state-of-the-art performance, outperforming MANDO-HGT, MTVHunter, GNN-SC, and SCVHunter by up to 39.6 percentage points, with a peak Macro F1 of 91.28% on the primary benchmark. ORACAL maintains strong generalization on out-of-distribution datasets with 91.8% on CGT Weakness and 77.1% on DAppScan. In explainability evaluation, PGExplainer achieves 32.51% Mean Intersection over Union (MIoU) against manually annotated vulnerability triggering paths. Under adversarial attacks, ORACAL limits performance degradation to approximately 2.35% F1 decrease with an Attack Success Rate (ASR) of only 3%, surpassing SCVHunter and MANDO-HGT which exhibit ASRs ranging from 10.91% to 18.73%.

CRSep 16, 2025Code
xOffense: An AI-driven autonomous penetration testing framework with offensive knowledge-enhanced LLMs and multi agent systems

Phung Duc Luong, Le Tran Gia Bao, Nguyen Vu Khai Tam et al.

This work introduces xOffense, an AI-driven, multi-agent penetration testing framework that shifts the process from labor-intensive, expert-driven manual efforts to fully automated, machine-executable workflows capable of scaling seamlessly with computational infrastructure. At its core, xOffense leverages a fine-tuned, mid-scale open-source LLM (Qwen3-32B) to drive reasoning and decision-making in penetration testing. The framework assigns specialized agents to reconnaissance, vulnerability scanning, and exploitation, with an orchestration layer ensuring seamless coordination across phases. Fine-tuning on Chain-of-Thought penetration testing data further enables the model to generate precise tool commands and perform consistent multi-step reasoning. We evaluate xOffense on two rigorous benchmarks: AutoPenBench and AI-Pentest-Benchmark. The results demonstrate that xOffense consistently outperforms contemporary methods, achieving a sub-task completion rate of 79.17%, decisively surpassing leading systems such as VulnBot and PentestGPT. These findings highlight the potential of domain-adapted mid-scale LLMs, when embedded within structured multi-agent orchestration, to deliver superior, cost-efficient, and reproducible solutions for autonomous penetration testing.

CRFeb 20
PenTiDef: Enhancing Privacy and Robustness in Decentralized Federated Intrusion Detection Systems against Poisoning Attacks

Phan The Duy, Nghi Hoang Khoa, Nguyen Tran Anh Quan et al.

The increasing deployment of Federated Learning (FL) in Intrusion Detection Systems (IDS) introduces new challenges related to data privacy, centralized coordination, and susceptibility to poisoning attacks. While significant research has focused on protecting traditional FL-IDS with centralized aggregation servers, there remains a notable gap in addressing the unique challenges of decentralized FL-IDS (DFL-IDS). This study aims to address the limitations of traditional centralized FL-IDS by proposing a novel defense framework tailored for the decentralized FL-IDS architecture, with a focus on privacy preservation and robustness against poisoning attacks. We propose PenTiDef, a privacy-preserving and robust defense framework for DFL-IDS, which incorporates Distributed Differential Privacy (DDP) to protect data confidentiality and utilizes latent space representations (LSR) derived from neural networks to detect malicious updates in the decentralized model aggregation context. To eliminate single points of failure and enhance trust without a centralized aggregation server, PenTiDef employs a blockchain-based decentralized coordination mechanism that manages model aggregation, tracks update history, and supports trust enforcement through smart contracts. Experimental results on CIC-IDS2018 and Edge-IIoTSet demonstrate that PenTiDef consistently outperforms existing defenses (e.g., FLARE, FedCC) across various attack scenarios and data distributions. These findings highlight the potential of PenTiDef as a scalable and secure framework for deploying DFL-based IDS in adversarial environments. By leveraging privacy protection, malicious behavior detection in hidden data, and working without a central server, it provides a useful security solution against real-world attacks from untrust participants.

CRNov 1, 2021
B-DAC: A Decentralized Access Control Framework on Northbound Interface for Securing SDN Using Blockchain

Phan The Duy, Hien Do Hoang, Do Thi Thu Hien et al.

Software-Defined Network (SDN) is a new arising terminology of network architecture with outstanding features of orchestration by decoupling the control plane and the data plane in each network element. Even though it brings several benefits, SDN is vulnerable to a diversity of attacks. Abusing the single point of failure in the SDN controller component, hackers can shut down all network operations. More specifics, a malicious OpenFlow application can access to SDN controller to carry out harmful actions without any limitation owing to the lack of the access control mechanism as a standard in the Northbound. The sensitive information about the whole network such as network topology, flow information, and statistics can be gathered and leaked out. Even worse, the entire network can be taken over by the compromised controller. Hence, it is vital to build a scheme of access control for SDN's Northbound. Furthermore, it must also protect the data integrity and availability during data exchange between application and controller. To address such limitations, we introduce B-DAC, a blockchain-based framework for decentralized authentication and fine-grained access control for the Northbound interface to assist administrators in managing and protecting critical resources. With strict policy enforcement, B-DAC can perform decentralized access control for each request to keep network applications under surveillance for preventing over-privileged activities or security policy conflicts. To demonstrate the feasibility of our approach, we also implement a prototype of this framework to evaluate the security impact, effectiveness, and performance through typical use cases.

CRSep 1, 2020
A survey on Blockchain-based applications for reforming data protection, privacy and security

Phan The Duy, Do Thi Thu Hien, Van-Hau Pham

The modern society, economy and industry have been changed remarkably by many cutting-edge technologies over the last years, and many more are in development and early implementation that will in turn led even wider spread of adoptions and greater alteration. Blockchain technology along with other rising ones is expected to transform virtually every aspect of global business and individuals' lifestyle in some areas. It has been spreading with multi-sector applications from financial services to healthcare, supply chain, and cybersecurity emerging every passing day. Simultaneously, in the digital world, data protection and privacy are the most enormous issues which customers, companies and policymakers also take seriously into consideration due to the recent increase of security breaches and surveillance in reported incidents. In this case, blockchain has the capability and potential to revolutionize trust, security and privacy of individual data in the online world. Hence, the purpose of this paper is to study the actual cases of Blockchain applied in the reformation of privacy and security field by discussing its impacts as well as the opportunities and challenges.