SEMar 21

LogFold: Compressing Logs with Structured Tokens and Hybrid Encoding

arXiv:2603.2061846.7h-index: 18
AI Analysis

This work addresses the challenge of managing rapidly growing log volumes for software organizations, offering a domain-specific incremental improvement in compression methods.

The paper tackles the problem of compressing log data efficiently by identifying redundancies in structured tokens and using a type-aware encoding strategy, resulting in LogFold achieving an average compression ratio improvement of 11.11% over state-of-the-art baselines with a compression speed of 9.842 MB/s.

Logs are essential for diagnosing failures and conducting retrospective studies, leading many software organizations to retain log messages for a long time. Nevertheless, the volume of generated log data grows rapidly as software systems grow, necessitating an effective compression method. Apart from general-purpose compressors (e.g., Gzip, Bzip2), many recent studies developed log-specific compression algorithms, but they offer suboptimal performance because of (1) overlooking redundancies within certain complex tokens, and (2) lacking a fine-grained encoding strategy for diverse token types. This work uncovers a new redundancy pattern in structured tokens and proposes a new type-aware encoding strategy to improve log compression. Building on this insight, we introduce LogFold, a novel log compression method consisting of four components: a token analyzer to classifies tokens as structured, unstructured, or static types; a processor that mines recurring patterns within structured tokens based on their delimiter skeletons; a hybrid encoder that tailors data representation according to token types; and a packer that compresses the output into an archive file. Extensive experiments on 16 public log datasets demonstrate that LogFold surpasses state-of-the-art baselines, achieving average compression ratio improvements by 11.11%, with a compression speed of 9.842 MB/s. Ablation studies further indicate the importance of each component. We also conduct sensitivity analyses to verify LogFold's robustness and stability across various internal settings.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes