AIDec 20, 2022

MDL-based Compressing Sequential Rules

arXiv:2212.10252v11 citationsh-index: 11
Originality Synthesis-oriented
AI Analysis

This addresses a storage and efficiency issue for data mining practitioners, but it is incremental as it applies existing MDL principles to a new context.

The paper tackles the problem of compressing sequential rule mining (SRM) results to reduce data size for storage and transmission, proposing a solution called ComSR based on the Minimum Description Length (MDL) principle, which successfully finds compact and meaningful sequential rules in experiments on a real dataset.

Nowadays, with the rapid development of the Internet, the era of big data has come. The Internet generates huge amounts of data every day. However, extracting meaningful information from massive data is like looking for a needle in a haystack. Data mining techniques can provide various feasible methods to solve this problem. At present, many sequential rule mining (SRM) algorithms are presented to find sequential rules in databases with sequential characteristics. These rules help people extract a lot of meaningful information from massive amounts of data. How can we achieve compression of mined results and reduce data size to save storage space and transmission time? Until now, there has been little research on the compression of SRM. In this paper, combined with the Minimum Description Length (MDL) principle and under the two metrics (support and confidence), we introduce the problem of compression of SRM and also propose a solution named ComSR for MDL-based compressing of sequential rules based on the designed sequential rule coding scheme. To our knowledge, we are the first to use sequential rules to encode an entire database. A heuristic method is proposed to find a set of compact and meaningful sequential rules as much as possible. ComSR has two trade-off algorithms, ComSR_non and ComSR_ful, based on whether the database can be completely compressed. Experiments done on a real dataset with different thresholds show that a set of compact and meaningful sequential rules can be found. This shows that the proposed method works.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes