CRAICLJan 8, 2025

Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction

arXiv:2501.06239v18 citationsh-index: 12024 2nd International Conference on Foundation and Large Language Models (FLLM)
Originality Highly original
AI Analysis

This addresses the challenge of data scarcity in cybersecurity for organizations and institutions by providing a flexible solution that works in both low-resource and data-rich environments.

The paper tackles the problem of extracting cyber threat intelligence from diverse data formats by introducing 0-CTI, a scalable AI framework that supports both supervised and zero-shot learning for entity and relation extraction, with the supervised entity extractor surpassing current state-of-the-art performance.

Cyber Threat Intelligence (CTI) is critical for mitigating threats to organizations, governments, and institutions, yet the necessary data are often dispersed across diverse formats. AI-driven solutions for CTI Information Extraction (IE) typically depend on high-quality, annotated data, which are not always available. This paper introduces 0-CTI, a scalable AI-based framework designed for efficient CTI Information Extraction. Leveraging advanced Natural Language Processing (NLP) techniques, particularly Transformer-based architectures, the proposed system processes complete text sequences of CTI reports to extract a cyber ontology of named entities and their relationships. Our contribution is the development of 0-CTI, the first modular framework for CTI Information Extraction that supports both supervised and zero-shot learning. Unlike existing state-of-the-art models that rely heavily on annotated datasets, our system enables fully dataless operation through zero-shot methods for both Entity and Relation Extraction, making it adaptable to various data availability scenarios. Additionally, our supervised Entity Extractor surpasses current state-of-the-art performance in cyber Entity Extraction, highlighting the dual strength of the framework in both low-resource and data-rich environments. By aligning the system's outputs with the Structured Threat Information Expression (STIX) format, a standard for information exchange in the cybersecurity domain, 0-CTI standardizes extracted knowledge, enhancing communication and collaboration in cybersecurity operations.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes