DBAIJan 30, 2017

Comparing Dataset Characteristics that Favor the Apriori, Eclat or FP-Growth Frequent Itemset Mining Algorithms

arXiv:1701.09042v185 citations
Originality Synthesis-oriented
AI Analysis

It addresses a gap in understanding algorithm performance for similarly sized datasets with varying itemset characteristics, which is incremental but useful for data mining practitioners.

This paper investigates how dataset characteristics, specifically frequent item density and maximum transaction size, affect the performance of Apriori, Eclat, and FP-Growth algorithms, finding that Eclat and FP-Growth handle increases in these characteristics better than Apriori.

Frequent itemset mining is a popular data mining technique. Apriori, Eclat, and FP-Growth are among the most common algorithms for frequent itemset mining. Considerable research has been performed to compare the relative performance between these three algorithms, by evaluating the scalability of each algorithm as the dataset size increases. While scalability as data size increases is important, previous papers have not examined the performance impact of similarly sized datasets that contain different itemset characteristics. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm. This paper explores the effects that two dataset characteristics can have on the performance of these three frequent itemset algorithms. To perform this empirical analysis, a dataset generator is created to measure the effects of frequent item density and the maximum transaction size on performance. The generated datasets contain the same number of rows. This provides some insight into dataset characteristics that are conducive to each algorithm. The results of this paper's research demonstrate Eclat and FP-Growth both handle increases in maximum transaction size and frequent itemset density considerably better than the Apriori algorithm.

Code Implementations2 repos
Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes