Flexible Clustering with a Sparse Mixture of Generalized Hyperbolic Distributions
Abstract
Robust clustering of high-dimensional data is an important topic because
clusters in real datasets are often heavy-tailed and/or asymmetric. Traditional
approaches to model-based clustering often fail for high dimensional data,
e.g., due to the number of free covariance parameters. A parametrization of the
component scale matrices for the mixture of generalized hyperbolic
distributions is proposed. This parameterization includes a penalty term in the
likelihood. An analytically feasible expectation-maximization algorithm is
developed by placing a gamma-lasso penalty constraining the concentration
matrix. The proposed methodology is investigated through simulation studies and
illustrated using two real datasets.