ME MLFeb 13, 2018

Clustering and Semi-Supervised Classification for Clickstream Data via Mixture Models

Michael P. B. Gallaugher, Paul D. McNicholas

arXiv:1802.04849v22.33 citations

Originality Incremental advance

AI Analysis

This addresses the lack of statistical learning approaches for clickstream data, which is an emerging data type, but the method is incremental as it extends existing mixture models by incorporating continuous time.

The paper tackled the problem of clustering and semi-supervised classification for clickstream data by introducing a mixture of first-order continuous time Markov models, which accounts for time spent on webpages, and showed evaluation results on simulated and real data.

Finite mixture models have been used for unsupervised learning for some time, and their use within the semi-supervised paradigm is becoming more commonplace. Clickstream data is one of the various emerging data types that demands particular attention because there is a notable paucity of statistical learning approaches currently available. A mixture of first-order continuous time Markov models is introduced for unsupervised and semi-supervised learning of clickstream data. This approach assumes continuous time, which distinguishes it from existing mixture model-based approaches; practically, this allows account to be taken of the amount of time each user spends on each webpage. The approach is evaluated, and compared to the discrete time approach, using simulated and real data.

View on arXiv PDF

Similar