Optimal Transport Based Change Point Detection and Time Series Segment Clustering
This addresses time series analysis problems for data scientists, but it is incremental as it builds on existing theoretical advances.
The paper tackles change point detection and time series segment clustering by proposing a distribution-free algorithm based on Wasserstein two-sample tests, showing benefits on synthetic and real data.
Two common problems in time series analysis are the decomposition of the data stream into disjoint segments that are each in some sense "homogeneous" - a problem known as Change Point Detection (CPD) - and the grouping of similar nonadjacent segments, a problem that we call Time Series Segment Clustering (TSSC). Building upon recent theoretical advances characterizing the limiting distribution-free behavior of the Wasserstein two-sample test (Ramdas et al. 2015), we propose a novel algorithm for unsupervised, distribution-free CPD which is amenable to both offline and online settings. We also introduce a method to mitigate false positives in CPD and address TSSC by using the Wasserstein distance between the detected segments to build an affinity matrix to which we apply spectral clustering. Results on both synthetic and real data sets show the benefits of the approach.