IRCLCRApr 16, 2015

Towards a relation extraction framework for cyber-security concepts

arXiv:1504.04317v189 citations
Originality Incremental advance
AI Analysis

This work addresses the need for tailored information retrieval methods in cybersecurity, where labeled data is scarce, but it is incremental as it builds on existing semi-supervised NLP approaches.

The researchers tackled the problem of extracting security entities and relationships from text to assist security analysts, achieving a precision of 0.82 in preliminary testing on a small corpus.

In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.

Code Implementations1 repo
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes