Boosting Frequent Itemset Mining via Early Stopping Intersections
This work provides an incremental improvement for data mining practitioners by speeding up existing frequent itemset mining algorithms.
The paper tackles the problem of reducing support checking time in frequent itemset mining algorithms by introducing an early-stopping technique to detect infrequent candidates early, resulting in runtime reduction across various datasets.
Mining frequent itemsets from a transaction database has emerged as a fundamental problem in data mining and committed itself as a building block for many pattern mining tasks. In this paper, we present a general technique to reduce support checking time in existing depth-first search generate-and-test schemes such as Eclat/dEclat and PrePost+. Our technique allows infrequent candidate itemsets to be detected early. The technique is based on an early-stopping criterion and is general enough to be applicable in many frequent itemset mining algorithms. We have applied the technique to two TID-list based schemes (Eclat/dEclat) and one N-list based scheme (PrePost+). Our technique has been tested over a variety of datasets and confirmed its effectiveness in runtime reduction.