CLLGJun 1, 2025

A Word is Worth 4-bit: Efficient Log Parsing with Binary Coded Decimal Recognition

arXiv:2506.01147v1h-index: 22025 IEEE International Conference on Web Services (ICWS)
Originality Incremental advance
AI Analysis

This addresses the problem of suboptimal accuracy in log parsing for downstream tasks like pattern identification, though it appears incremental as it builds on existing parsing methods.

The paper tackles the problem of fine-grained log parsing for system-generated logs by proposing a character-level neural parser that uses binary-coded decimal recognition. The result is a low-resource parser that matches LLM-based parsers in accuracy and outperforms semantic parsers in efficiency on Loghub-2k and industrial datasets.

System-generated logs are typically converted into categorical log templates through parsing. These templates are crucial for generating actionable insights in various downstream tasks. However, existing parsers often fail to capture fine-grained template details, leading to suboptimal accuracy and reduced utility in downstream tasks requiring precise pattern identification. We propose a character-level log parser utilizing a novel neural architecture that aggregates character embeddings. Our approach estimates a sequence of binary-coded decimals to achieve highly granular log templates extraction. Our low-resource character-level parser, tested on revised Loghub-2k and a manually annotated industrial dataset, matches LLM-based parsers in accuracy while outperforming semantic parsers in efficiency.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes