Parsimonious mixtures of multivariate contaminated normal distributions
Abstract
A mixture of multivariate contaminated normal distributions is developed for
model-based clustering. In addition to the parameters of the classical normal
mixture, our contaminated mixture has, for each cluster, a parameter
controlling the proportion of mild outliers and one specifying the degree of
contamination. Crucially, these parameters do not have to be specified a
priori, adding a flexibility to our approach. Parsimony is introduced via
eigen-decomposition of the component covariance matrices, and sufficient
conditions for the identifiability of all the members of the resulting family
are provided. An expectation-conditional maximization algorithm is outlined for
parameter estimation and various implementation issues are discussed. Using a
large scale simulation study, the behaviour of the proposed approach is
investigated and comparison with well-established finite mixtures is provided.
The performance of this novel family of models is also illustrated on
artificial and real data.