Ranked differences Pearson correlation dissimilarity with an application to electricity users time series clustering
This is an incremental improvement for time series clustering applications in fields like energy, potentially aiding in customer segmentation.
The authors tackled the problem of clustering time series with complex patterns by proposing a new dissimilarity measure called ranked Pearson correlation dissimilarity (RDPC), which outperformed existing methods in cases involving seasonal patterns, trends, and peaks, as demonstrated on an electricity consumption dataset.
Time series clustering is an unsupervised learning method for classifying time series data into groups with similar behavior. It is used in applications such as healthcare, finance, economics, energy, and climate science. Several time series clustering methods have been introduced and used for over four decades. Most of them focus on measuring either Euclidean distances or association dissimilarities between time series. In this work, we propose a new dissimilarity measure called ranked Pearson correlation dissimilarity (RDPC), which combines a weighted average of a specified fraction of the largest element-wise differences with the well-known Pearson correlation dissimilarity. It is incorporated into hierarchical clustering. The performance is evaluated and compared with existing clustering algorithms. The results show that the RDPC algorithm outperforms others in complicated cases involving different seasonal patterns, trends, and peaks. Finally, we demonstrate our method by clustering a random sample of customers from a Thai electricity consumption time series dataset into seven groups with unique characteristics.