CR IRFeb 10, 2021

Malware Knowledge Graph Generation

Sharmishtha Dutta, Nidhi Rastogi, Destin Yee, Chuqiao Gu, Qicheng Ma

arXiv:2102.05583v16.65 citationsh-index: 14Has Code

Originality Incremental advance

AI Analysis

This addresses the lack of open-source knowledge graphs in the security domain, enabling more efficient threat intelligence extraction without relying heavily on security experts.

The authors tackled the problem of unstructured cyber threat information by building TINKER, a knowledge graph for threat intelligence, using RDF triples from 83 threat reports published between 2006-2021, resulting in a structured representation for downstream tasks like predicting missing information and future threats.

Cyber threat and attack intelligence information are available in non-standard format from heterogeneous sources. Comprehending them and utilizing them for threat intelligence extraction requires engaging security experts. Knowledge graphs enable converting this unstructured information from heterogeneous sources into a structured representation of data and factual knowledge for several downstream tasks such as predicting missing information and future threat trends. Existing large-scale knowledge graphs mainly focus on general classes of entities and relationships between them. Open-source knowledge graphs for the security domain do not exist. To fill this gap, we've built \textsf{TINKER} - a knowledge graph for threat intelligence (\textbf{T}hreat \textbf{IN}telligence \textbf{K}nowl\textbf{E}dge g\textbf{R}aph). \textsf{TINKER} is generated using RDF triples describing entities and relations from tokenized unstructured natural language text from 83 threat reports published between 2006-2021. We built \textsf{TINKER} using classes and properties defined by open-source malware ontology and using hand-annotated RDF triples. We also discuss ongoing research and challenges faced while creating \textsf{TINKER}.

View on arXiv PDF

Similar