Flexible Variable Selection for Clustering and Classification
Abstract
The importance of variable selection for clustering has been recognized for
some time, and mixture models are well-established as a statistical approach to
clustering. Yet, the literature on variable selection in model-based clustering
remains largely rooted in the assumption of Gaussian clusters. Unsurprisingly,
variable selection algorithms based on this assumption tend to break down in
the presence of cluster skewness. A novel variable selection algorithm is
presented that utilizes the Manly transformation mixture model to select
variables based on their ability to separate clusters, and is effective even
when clusters depart from the Gaussian assumption. The proposed approach, which
is implemented within the R package vscc, is compared to existing variable
selection methods -- including an existing method that can account for cluster
skewness -- using simulated and real datasets