IR CL CRApr 16, 2015

Towards a relation extraction framework for cyber-security concepts

Corinne L. Jones, Robert A. Bridges, Kelly Huffer, John Goodall

arXiv:1504.04317v189 citations

Originality Incremental advance

AI Analysis

This work addresses the need for tailored information retrieval methods in cybersecurity, where labeled data is scarce, but it is incremental as it builds on existing semi-supervised NLP approaches.

The researchers tackled the problem of extracting security entities and relationships from text to assist security analysts, achieving a precision of 0.82 in preliminary testing on a small corpus.

In order to assist security analysts in obtaining information pertaining to their network, such as novel vulnerabilities, exploits, or patches, information retrieval methods tailored to the security domain are needed. As labeled text data is scarce and expensive, we follow developments in semi-supervised Natural Language Processing and implement a bootstrapping algorithm for extracting security entities and their relationships from text. The algorithm requires little input data, specifically, a few relations or patterns (heuristics for identifying relations), and incorporates an active learning component which queries the user on the most important decisions to prevent drifting from the desired relations. Preliminary testing on a small corpus shows promising results, obtaining precision of .82.

View on arXiv PDF

Similar