Qian Cui

h-index17
2papers

2 Papers

71.5CRMay 25
TTPrint: Evidence-Grounded TTP Extraction via Diverge-then-Converge Verification

Yutong Cheng, Changze Li, Raihan Sultan Pasha Basuki et al.

Extracting MITRE ATT&CK techniques from cyber threat intelligence (CTI) reports is an open-set, multi-label problem requiring both high recall (not missing techniques) and high precision (not hallucinating unsupported ones). Existing methods--rule-based, supervised, and LLM-based--struggle to achieve both: rule-based and supervised approaches lack generalizability across diverse attack descriptions, while LLM-based approaches that couple candidate generation and validation within a single inference step suffer from limited recall and precision simultaneously. We propose TTPrint, which addresses this challenge through a diverge-then-converge design inspired by how human analysts work: first extracting broadly, then verifying rigorously. In the divergent phase, reports are decomposed into atomic behaviors and candidate techniques are proposed broadly. A deterministic span localization stage then anchors each candidate to a specific evidence window in the source text. A convergent verification stage retains only candidates supported by both the localized evidence and the authoritative MITRE definition. We contribute two evaluation resources--a cleaned TRAM benchmark (TRAM-Clean) and a new annotated dataset (TTPrint-Bench)--to address known annotation noise in existing benchmarks and elevate the task to document-level TTP extraction. On TRAM-Clean and TTPrint-Bench, TTPrint achieves 76.48% and 87.39% macro-F1 respectively, outperforming the leading baseline by 63.5% and 29.4%. A multi-backbone analysis across six LLMs and a threshold sensitivity study further demonstrate generalizability across model choices and provide practical guidance for parameter selection.

LGSep 10, 2025
Contextual Learning for Anomaly Detection in Tabular Data

Spencer King, Zhilu Zhang, Ruofan Yu et al.

Anomaly detection is critical in domains such as cybersecurity and finance, especially when working with large-scale tabular data. Yet, unsupervised anomaly detection-where no labeled anomalies are available-remains challenging because traditional deep learning methods model a single global distribution, assuming all samples follow the same behavior. In contrast, real-world data often contain heterogeneous contexts (e.g., different users, accounts, or devices), where globally rare events may be normal within specific conditions. We introduce a contextual learning framework that explicitly models how normal behavior varies across contexts by learning conditional data distributions $P(\mathbf{Y} \mid \mathbf{C})$ rather than a global joint distribution $P(\mathbf{X})$. The framework encompasses (1) a probabilistic formulation for context-conditioned learning, (2) a principled bilevel optimization strategy for automatically selecting informative context features using early validation loss, and (3) theoretical grounding through variance decomposition and discriminative learning principles. We instantiate this framework using a novel conditional Wasserstein autoencoder as a simple yet effective model for tabular anomaly detection. Extensive experiments across eight benchmark datasets demonstrate that contextual learning consistently outperforms global approaches-even when the optimal context is not intuitively obvious-establishing a new foundation for anomaly detection in heterogeneous tabular data.