Ali Shoker

CR
h-index9
6papers
2citations
Novelty38%
AI Score47

6 Papers

CRJun 4
GenTI: Benchmarking LLMs for Autonomous IDPS Rule Generation for Unseen Attacks

Hassan Jalil Hadi, Rehana Yasmin, Ali Shoker

Rule-based Intrusion Detection and Prevention Systems (IDPS) offer precise attack detection as well as mitigation, however their manually crafted, signature-driven rules limit adaptability to emerging and zero-day threats. Additionally, existing public datasets (e.g., CICIDS2017, UNSW-NB15) focus on traffic classification and provide little structured information to support automatic rule synthesis or prevention logic. To address this gap, we propose Generative Thread Intelligence (GenTI) \footnote{GenTI refers to the proposed framework, and GTI refers to the dataset.} an LLM-driven benchmark for automatic generation of IDPS rules targeting unseen attacks. The dataset (GTI) aggregates over 150k detection and prevention rules from Snort, Suricata, Emerging Threats, as well as 50k YARA, each annotated with protocol behavior, payload signatures, contextual relationships, mappings to Cyber Threat Intelligence (CTI), along with actionable response types (alert, drop, reject). Moreover, on top of this corpus we design an LLM-based pipeline that transforms analyst prompts and representative payloads into deployable rules via structured prompt engineering, Chain-of-Thought (CoT) reasoning, as well as a Chain-of-Verification (CoVe) loop for syntactic, semantic, and security validation. The generated rules are executed in real time on (Snort/Suricata) and evaluated by syntax accuracy, semantic similarity, CTI coverage, security effectiveness as well as unseen attacks detection. Furthermore, our GenTI instantiation achieves a composite rule-quality score of 89.4\%, with 94.8\% CTI coverage, improving unseen attacks detection from 45\% to 87.4\% and reducing the false-positive rate from 8.5\% to 2.3\%. Overall, GenTI establishes the first large-scale benchmark that tightly couples rule-level CTI with LLM-based automation, enabling adaptive, self-evolving IDPS.

CRMay 7
LCC-LLM: Leveraging Code-Centric Large Language Models for Malware Attribution

Christopher G. Pedraza Pohlenz, Hassan Jalil Hadi, Ali Hassan et al.

LLMs are increasingly explored for malware analysis; however, current LLM-based malware attribution remains limited by unsupported indicators and insufficient code-level grounding for identifying malicious and vulnerable code segments. To address these limitations, this research introduces LCC-LLM, a code-centric benchmark dataset and evidence-grounded framework for malware attribution and multi-task static malware analysis. The proposed LCCD dataset contains approximately 34K PE samples processed through a large-scale reverse-engineering pipeline and represented using decompiled C code, assembly code, CFG/FCG artifacts, hexadecimal data, PE metadata, suspicious API evidence, and structural features. Beyond dataset construction, LCC-LLM integrates LangGraph-orchestrated static analysis with multi-source cybersecurity knowledge to support evidence-grounded malware reasoning. The framework employs a seven-layer retrieval-augmented generation pipeline, CoVe for IoC validation, and a multi-dimensional quality gate to improve factual reliability and analyst-oriented decision support. Curriculum-ordered instruction data is used to fine-tune DeepSeek-R1-Distill-Qwen-14B and Qwen3-Coder-30B-A3B using QLoRA. Evaluation across 43 malware-analysis task types achieves an average semantic similarity of 0.634, with the highest task-level performance in structured report generation, IoC extraction, vulnerability assessment, malware configuration extraction, and malware class detection. In a real-world case study using MalwareBazaar samples, the grounded pipeline achieves a 10/10 structured analysis pass rate, producing CFG/FCG evidence, MITRE ATT&CK mappings, detection guidance, and analyst-ready reports. These results show that code-centric representations, retrieval grounding, and verification-guided reasoning improve the reliability and operational usefulness of LLM-assisted malware attribution.

AIMar 16
CRASH: Cognitive Reasoning Agent for Safety Hazards in Autonomous Driving

Erick Silva, Rehana Yasmin, Ali Shoker

As AVs grow in complexity and diversity, identifying the root causes of operational failures has become increasingly complex. The heterogeneity of system architectures across manufacturers, ranging from end-to-end to modular designs, together with variations in algorithms and integration strategies, limits the standardization of incident investigations and hinders systematic safety analysis. This work examines real-world AV incidents reported in the NHTSA database. We curate a dataset of 2,168 cases reported between 2021 and 2025, representing more than 80 million miles driven. To process this data, we introduce CRASH, Cognitive Reasoning Agent for Safety Hazards, an LLM-based agent that automates reasoning over crash reports by leveraging both standardized fields and unstructured narrative descriptions. CRASH operates on a unified representation of each incident to generate concise summaries, attribute a primary cause, and assess whether the AV materially contributed to the event. Our findings show that (1) CRASH attributes 64% of incidents to perception or planning failures, underscoring the importance of reasoning-based analysis for accurate fault attribution; and (2) approximately 50% of reported incidents involve rear-end collisions, highlighting a persistent and unresolved challenge in autonomous driving deployment. We further validate CRASH with five domain experts, achieving 86% accuracy in attributing AV system failures. Overall, CRASH demonstrates strong potential as a scalable and interpretable tool for automated crash analysis, providing actionable insights to support safety research and the continued development of autonomous driving systems.

CRMay 7
Toward Space-Based Public Key Systems: Enabling Secure Space Communications through In-Orbit Trust Services

Rehana Yasmin, Paulo Esteves-Verissimo, Ali Shoker

The New Space era has led to a rapid increase in satellites operated by independent entities in near-Earth orbit. This shift enables richer space services but also requires secure, near-real-time coordination, making efficient authentication of space assets critical for next-generation missions. Traditional ground-dependent Public Key Infrastructure (PKI) suffers from latency and operational bottlenecks that limit scalability and availability in dynamic space environments. This paper proposes architectural designs for space-based PKI that shift certificate management and validation from ground infrastructure into space, reducing reliance on ground stations while enabling interoperability and cross-entity collaboration. Two deployment schemes are introduced: a space-ground integrated PKI with in-orbit validation authorities, and a fully autonomous space-based PKI with in-space issuance and validation. We analyze deployment trade-offs in scalability, availability, security, cost, and operational complexity in multi-operator environments. A baseline latency analysis is provided to illustrate performance implications of in-orbit trust management.

CLOct 11, 2025
A-IPO: Adaptive Intent-driven Preference Optimization

Wenqing Wang, Muhammad Asif Ali, Ali Shoker et al.

Human preferences are diverse and dynamic, shaped by regional, cultural, and social factors. Existing alignment methods like Direct Preference Optimization (DPO) and its variants often default to majority views, overlooking minority opinions and failing to capture latent user intentions in prompts. To address these limitations, we introduce \underline{\textbf{A}}daptive \textbf{\underline{I}}ntent-driven \textbf{\underline{P}}reference \textbf{\underline{O}}ptimization (\textbf{A-IPO}). Specifically,A-IPO introduces an intention module that infers the latent intent behind each user prompt and explicitly incorporates this inferred intent into the reward function, encouraging stronger alignment between the preferred model's responses and the user's underlying intentions. We demonstrate, both theoretically and empirically, that incorporating an intention--response similarity term increases the preference margin (by a positive shift of $λ\,Δ\mathrm{sim}$ in the log-odds), resulting in clearer separation between preferred and dispreferred responses compared to DPO. For evaluation, we introduce two new benchmarks, Real-pref, Attack-pref along with an extended version of an existing dataset, GlobalOpinionQA-Ext, to assess real-world and adversarial preference alignment. Through explicit modeling of diverse user intents,A-IPO facilitates pluralistic preference optimization while simultaneously enhancing adversarial robustness in preference alignment. Comprehensive empirical evaluation demonstrates that A-IPO consistently surpasses existing baselines, yielding substantial improvements across key metrics: up to +24.8 win-rate and +45.6 Response-Intention Consistency on Real-pref; up to +38.6 Response Similarity and +52.2 Defense Success Rate on Attack-pref; and up to +54.6 Intention Consistency Score on GlobalOpinionQA-Ext.

AIFeb 8, 2024
Savvy: Trustworthy Autonomous Vehicles Architecture

Ali Shoker, Rehana Yasmin, Paulo Esteves-Verissimo

The increasing interest in Autonomous Vehicles (AV) is notable due to business, safety, and performance reasons. While there is salient success in recent AV architectures, hinging on the advancements in AI models, there is a growing number of fatal incidents that impedes full AVs from going mainstream. This calls for the need to revisit the fundamentals of building safety-critical AV architectures. However, this direction should not deter leveraging the power of AI. To this end, we propose Savvy, a new trustworthy intelligent AV architecture that achieves the best of both worlds. Savvy makes a clear separation between the control plane and the data plane to guarantee the safety-first principles. The former assume control to ensure safety using design-time defined rules, while launching the latter for optimizing decisions as much as possible within safety time-bounds. This is achieved through guided Time-aware predictive quality degradation (TPQD): using dynamic ML models that can be tuned to provide either richer or faster outputs based on the available safety time bounds. For instance, Savvy allows to safely identify an elephant as an obstacle (a mere object) the earliest possible, rather than optimally recognizing it as an elephant when it is too late. This position paper presents the Savvy's motivations and concept, whereas empirical evaluation is a work in progress.