DBCRJan 9

Descriptor: Multi-Regional Cloud Honeypot Dataset (MURHCAD)

arXiv:2601.05813h-index: 36
Originality Synthesis-oriented
AI Analysis

For cybersecurity researchers, this dataset provides a high-resolution, multi-regional resource for studying global attack patterns and testing defensive strategies, though it is a data contribution rather than a novel method.

This paper presents MURHCAD, a 72-hour honeypot dataset from Microsoft Azure with 132,425 attack events, revealing that 2,438 source IPs from 95 countries are dominated by SIP, Telnet, and SMB protocols, with temporal peaks at 07:00 and 23:00 UTC and platform-specific geographic biases.

This data article introduces a comprehensive, high-resolution honeynet dataset designed to support standalone analyses of global cyberattack behaviors. Collected over a continuous 72-hour window (June 9 to 11, 2025) on Microsoft Azure, the dataset comprises 132,425 individual attack events captured by three honeypots (Cowrie, Dionaea, and SentryPeer) deployed across four geographically dispersed virtual machines. Each event record includes enriched metadata (UTC timestamps, source/destination IPs, autonomous system and organizational mappings, geolocation coordinates, targeted ports, and honeypot identifiers alongside derived temporal features and standardized protocol classifications). We provide actionable guidance for researchers seeking to leverage this dataset in anomaly detection, protocol-misuse studies, threat intelligence, and defensive policy design. Descriptive statistics highlight significant skew: 2,438 unique source IPs span 95 countries, yet the top 1% of IPs account for 1% of all events, and three protocols dominate: Session Initiation Protocol (SIP), Telnet, Server Message Block (SMB). Temporal analysis uncovers pronounced rush-hour peaks at 07:00 and 23:00 UTC, interspersed with maintenance-induced gaps that reveal operational blind spots. Geospatial mapping further underscores platform-specific biases: SentryPeer captures concentrated SIP floods in North America and Southeast Asia, Cowrie logs Telnet/SSH scans predominantly from Western Europe and the U.S., and Dionaea records SMB exploits around European nodes. By combining fine-grained temporal resolution with rich, contextual geolocation and protocol metadata, this standalone dataset aims to empower reproducible, cloud-scale investigations into evolving cyber threats. Accompanying analysis code and data access details are provided.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes