Batch Incremental Shared Nearest Neighbor Density Based Clustering Algorithm for Dynamic Datasets
This work addresses the need for scalable clustering in dynamic data environments, offering a practical improvement over existing methods that are limited to single-point insertions and cannot handle deletions.
The paper tackles the problem of efficiently clustering dynamic datasets with batch updates, presenting an incremental algorithm that handles both insertions and deletions, achieving up to four orders of magnitude faster performance than the baseline SNND algorithm while using up to 60% extra memory.
Incremental data mining algorithms process frequent updates to dynamic datasets efficiently by avoiding redundant computation. Existing incremental extension to shared nearest neighbor density based clustering (SNND) algorithm cannot handle deletions to dataset and handles insertions only one point at a time. We present an incremental algorithm to overcome both these bottlenecks by efficiently identifying affected parts of clusters while processing updates to dataset in batch mode. We show effectiveness of our algorithm by performing experiments on large synthetic as well as real world datasets. Our algorithm is up to four orders of magnitude faster than SNND and requires up to 60% extra memory than SNND while providing output identical to SNND.