Clustering and Semi-Supervised Classification for Clickstream Data via Mixture Models
Abstract
Finite mixture models have been used for unsupervised learning for some time,
and their use within the semi-supervised paradigm is becoming more commonplace.
Clickstream data is one of the various emerging data types that demands
particular attention because there is a notable paucity of statistical learning
approaches currently available. A mixture of first-order continuous time Markov
models is introduced for unsupervised and semi-supervised learning of
clickstream data. This approach assumes continuous time, which distinguishes it
from existing mixture model-based approaches; practically, this allows account
to be taken of the amount of time each user spends on each webpage. The
approach is evaluated, and compared to the discrete time approach, using
simulated and real data.