CRJun 1, 2023
Interpreting GNN-based IDS Detections Using Provenance Graph Structural FeaturesKunal Mukherjee, Joshua Wiedemeier, Tianhao Wang et al.
Advanced cyber threats (e.g., Fileless Malware and Advanced Persistent Threat (APT)) have driven the adoption of provenance-based security solutions. These solutions employ Machine Learning (ML) models for behavioral modeling and critical security tasks such as malware and anomaly detection. However, the opacity of ML-based security models limits their broader adoption, as the lack of transparency in their decision-making processes restricts explainability and verifiability. We tailored our solution towards Graph Neural Network (GNN)-based security solutions since recent studies employ GNNs to comprehensively digest system provenance graphs for security-critical tasks. To enhance the explainability of GNN-based security models, we introduce PROVEXPLAINER, a framework offering instance-level security-aware explanations using an interpretable surrogate model. PROVEXPLAINER's interpretable feature space consists of discriminant subgraph patterns and graph structural features, which can be directly mapped to the system provenance problem space, making the explanations human interpretable. We show how PROVEXPLAINER synergizes with current state-of-the-art (SOTA) GNN explainers to deliver domain and instance-specific explanations. We measure the explanation quality using the Fidelity+/Fidelity- metric as used by traditional GNN explanation literature, we incorporate the precision/recall metric, where we consider the accuracy of the explanation against the ground truth, and we designed a human actionability metric based on graph traversal distance. On real-world Fileless and APT datasets, PROVEXPLAINER achieves up to 29%/27%/25%/1.4x higher Fidelity+, precision, recall, and actionability (where higher values are better), and 12% lower Fidelity- (where lower values are better) when compared against SOTA GNN explainers.
40.9SIApr 29
MoltGraph: A Longitudinal Temporal Graph Dataset of Moltbook for Coordinated-Agent DetectionKunal Mukherjee, Cuneyt Gurcan Akcora, Murat Kantarcioglu
Agent-native social platforms such as Moltbook are rapidly emerging, yet they inherit and amplify classical influence and abuse attacks, where coordinated agents strategically comment and upvote to manipulate visibility and propagate narratives across communities. However, rigorous measurement and learning-based monitoring remain constrained by the absence of longitudinal, graph-native datasets for agentic social networks that jointly capture heterogeneous interactions, temporal drift, and visibility signals needed to connect coordination behavior to downstream exposure. We introduce MoltGraph as a realistic longitudinal agentic social-network graph dataset for studying how agents behave, coordinate, and evolve in the wild, enabling reproducible measurement on emerging multi-agent social ecosystems. Using MoltGraph, we provide the first graph-centric characterization of Moltbook as a dynamic network: (i) heavy-tailed connectivity with power-law exponents in the range alpha in [1.86, 2.72], (ii) accelerating hub formation and attention centralization where the top 1% agents account for 29.00% of engagements, (iii) bursty, short-lived coordination episodes, 98.33% last under 24 hours, and (iv) measurable exposure effects across submolts. In matched analyses, posts receiving coordinated engagement exhibit 506.35% higher early interaction rates (within H=5 days) and 242.63% higher downstream exposure in feeds than non-coordinated controls.
LGJan 30
Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot DetectionKunal Mukherjee, Zulfikar Alom, Tran Gia Bao Ngo et al.
The rise of bot accounts on social media poses significant risks to public discourse. To address this threat, modern bot detectors increasingly rely on Graph Neural Networks (GNNs). However, the effectiveness of these GNN-based detectors in real-world settings remains poorly understood. In practice, attackers continuously adapt their strategies as well as must operate under domain-specific and temporal constraints, which can fundamentally limit the applicability of existing attack methods. As a result, there is a critical need for robust GNN-based bot detection methods under realistic, constraint-aware attack scenarios. To address this gap, we introduce BOCLOAK to systematically evaluate the robustness of GNN-based social bot detection via both edge editing and node injection adversarial attacks under realistic constraints. BOCLOAK constructs a probability measure over spatio-temporal neighbor features and learns an optimal transport geometry that separates human and bot behaviors. It then decodes transport plans into sparse, plausible edge edits that evade detection while obeying real-world constraints. We evaluate BOCLOAK across three social bot datasets, five state-of-the-art bot detectors, three adversarial defenses, and compare it against four leading graph adversarial attack baselines. BOCLOAK achieves up to 80.13% higher attack success rates while using 99.80% less GPU memory under realistic real-world constraints. Most importantly, BOCLOAK shows that optimal transport provides a lightweight, principled framework for bridging the gap between adversarial attacks and real-world bot detection.
CRFeb 23
Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution EnvironmentsKunal Mukherjee
Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors for TEE architecture review, mitigation planning, and vulnerability triage. This creates a socio-technical risk surface: assistants may hallucinate TEE mechanisms, overclaim guarantees (e.g., what attestation does and does not establish), or behave unsafely under adversarial prompting. We present a red-teaming study of two prevalently deployed LLM assistants in the role of TEE security advisors: ChatGPT-5.2 and Claude Opus-4.6, focusing on the inherent limitations and transferability of prompt-induced failures across LLMs. We introduce TEE-RedBench, a TEE-grounded evaluation methodology comprising (i) a TEE-specific threat model for LLM-mediated security work, (ii) a structured prompt suite spanning SGX and TrustZone architecture, attestation and key management, threat modeling, and non-operational mitigation guidance, along with policy-bound misuse probes, and (iii) an annotation rubric that jointly measures technical correctness, groundedness, uncertainty calibration, refusal quality, and safe helpfulness. We find that some failures are not purely idiosyncratic, transferring up to 12.02% across LLM assistants, and we connect these outcomes to secure architecture by outlining an "LLM-in-the-loop" evaluation pipeline: policy gating, retrieval grounding, structured templates, and lightweight verification checks that, when combined, reduce failures by 80.62%.
LGJul 28, 2025
PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge AttributesTianhao Wang, Simon Klancher, Kunal Mukherjee et al.
The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging -- especially for real-world graphs with complex, heterogeneous schemas. Existing research has focused mostly on homogeneous structures with simple attributes, limiting their usefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph framework designed for complex heterogeneous graphs with high-dimensional node and edge attributes. ProvCreator formulates graph synthesis as a sequence generation task, enabling the use of transformer-based large language models. It features a versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graph structure and attributes, 2. efficiently compresses large graphs for contextual modeling, and 3. supports end-to-end, learnable graph generation. To validate our research, we evaluate ProvCreator on two challenging domains: system provenance graphs in cybersecurity and knowledge graphs from IntelliGraph Benchmark Dataset. In both cases, ProvCreator captures intricate dependencies between structure and semantics, enabling the generation of realistic and privacy-aware synthetic datasets.
IRFeb 12, 2025
Z-REx: Human-Interpretable GNN Explanations for Real Estate RecommendationsKunal Mukherjee, Zachary Harrison, Saeid Balaneshin
Transparency and interpretability are crucial for enhancing customer confidence and user engagement, especially when dealing with black-box Machine Learning (ML)-based recommendation systems. Modern recommendation systems leverage Graph Neural Network (GNN) due to their ability to produce high-quality recommendations in terms of both relevance and diversity. Therefore, the explainability of GNN is especially important for Link Prediction (LP) tasks since recommending relevant items can be viewed as predicting links between users and items. GNN explainability has been a well-studied field, but existing methods primarily focus on node or graph-level tasks, leaving a gap in LP explanation techniques. This work introduces Z-REx, a GNN explanation framework designed explicitly for heterogeneous link prediction tasks. Z-REx utilizes structural and attribute perturbation to identify critical substructures and important features while reducing the search space by leveraging domain-specific knowledge. In our experimentation, we show the efficacy of Z-REx in generating contextually relevant and human-interpretable explanations for ZiGNN, a GNN-based recommendation engine, using a real-world real-estate dataset from Zillow Group, Inc. We compare against State-of-The-Art (SOTA) GNN explainers to show Z-REx outperforms them by 61% in the Fidelity metric by producing superior human-interpretable explanations.