Model Based Clustering of High-Dimensional Binary Data
Abstract
We propose a mixture of latent trait models with common slope parameters
(MCLT) for model-based clustering of high-dimensional binary data, a data type
for which few established methods exist. Recent work on clustering of binary
data, based on a $d$-dimensional Gaussian latent variable, is extended by
incorporating common factor analyzers. Accordingly, our approach facilitates a
low-dimensional visual representation of the clusters. We extend the model
further by the incorporation of random block effects. The dependencies in each
block are taken into account through block-specific parameters that are
considered to be random variables. A variational approximation to the
likelihood is exploited to derive a fast algorithm for determining the model
parameters. Our approach is demonstrated on real and simulated data.