AICRLGNov 11, 2024

Towards Characterizing Cyber Networks with Large Language Models

arXiv:2411.07089v1
Originality Synthesis-oriented
AI Analysis

This work addresses threat hunting in cybersecurity by analyzing high-dimensional network data, but it appears incremental as it builds on existing methods without clear SOTA results.

The paper tackles the problem of detecting adversarial activities in cyber networks by using a prototype tool called CLEM, which applies natural language modeling to network traffic logs, and reports that the approach shows promise with evaluation using the Adjusted Rand Index.

Threat hunting analyzes large, noisy, high-dimensional data to find sparse adversarial behavior. We believe adversarial activities, however they are disguised, are extremely difficult to completely obscure in high dimensional space. In this paper, we employ these latent features of cyber data to find anomalies via a prototype tool called Cyber Log Embeddings Model (CLEM). CLEM was trained on Zeek network traffic logs from both a real-world production network and an from Internet of Things (IoT) cybersecurity testbed. The model is deliberately overtrained on a sliding window of data to characterize each window closely. We use the Adjusted Rand Index (ARI) to comparing the k-means clustering of CLEM output to expert labeling of the embeddings. Our approach demonstrates that there is promise in using natural language modeling to understand cyber data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes