Agglomerative Likelihood Clustering
This is an incremental improvement for time-series clustering applications, particularly in finance or online learning, where efficiency and scalability are critical.
The authors tackled the problem of fast time-series data clustering by developing an Agglomerative Likelihood Clustering algorithm (ALC) that reduces compute time and resource usage for large datasets, such as 20,000 assets, without requiring prior knowledge of cluster numbers.
We consider the problem of fast time-series data clustering. Building on previous work modeling the correlation-based Hamiltonian of spin variables we present an updated fast non-expensive Agglomerative Likelihood Clustering algorithm (ALC). The method replaces the optimized genetic algorithm based approach (f-SPC) with an agglomerative recursive merging framework inspired by previous work in Econophysics and Community Detection. The method is tested on noisy synthetic correlated time-series data-sets with built-in cluster structure to demonstrate that the algorithm produces meaningful non-trivial results. We apply it to time-series data-sets as large as 20,000 assets and we argue that ALC can reduce compute time costs and resource usage cost for large scale clustering for time-series applications while being serialized, and hence has no obvious parallelization requirement. The algorithm can be an effective choice for state-detection for online learning in a fast non-linear data environment because the algorithm requires no prior information about the number of clusters.