DBMLMay 6, 2015

Cats & Co: Categorical Time Series Coclustering

arXiv:1505.01300v1
Originality Incremental advance
AI Analysis

This work addresses the challenge of analyzing temporal event sequences for researchers and practitioners in fields like data mining, offering a new clustering approach but with incremental improvements over existing coclustering techniques.

The paper tackles the problem of clustering and exploratory analysis of categorical time series by proposing a novel method based on three-dimensional data grid models, which efficiently groups sequences with similar event distributions over time and demonstrates effectiveness on synthetic and real-world datasets.

We suggest a novel method of clustering and exploratory analysis of temporal event sequences data (also known as categorical time series) based on three-dimensional data grid models. A data set of temporal event sequences can be represented as a data set of three-dimensional points, each point is defined by three variables: a sequence identifier, a time value and an event value. Instantiating data grid models to the 3D-points turns the problem into 3D-coclustering. The sequences are partitioned into clusters, the time variable is discretized into intervals and the events are partitioned into clusters. The cross-product of the univariate partitions forms a multivariate partition of the representation space, i.e., a grid of cells and it also represents a nonparametric estimator of the joint distribution of the sequences, time and events dimensions. Thus, the sequences are grouped together because they have similar joint distribution of time and events, i.e., similar distribution of events along the time dimension. The best data grid is computed using a parameter-free Bayesian model selection approach. We also suggest several criteria for exploiting the resulting grid through agglomerative hierarchies, for interpreting the clusters of sequences and characterizing their components through insightful visualizations. Extensive experiments on both synthetic and real-world data sets demonstrate that data grid models are efficient, effective and discover meaningful underlying patterns of categorical time series data.

Foundations

The foundational work for this paper's niche, ranked by how specifically the neighbourhood builds on it — not by global fame.

Your Notes