Skewed Distributions or Transformations? Modelling Skewness for a Cluster Analysis
Abstract
Because of its mathematical tractability, the Gaussian mixture model holds a
special place in the literature for clustering and classification. For all its
benefits, however, the Gaussian mixture model poses problems when the data is
skewed or contains outliers. Because of this, methods have been developed over
the years for handling skewed data, and fall into two general categories. The
first is to consider a mixture of more flexible skewed distributions, and the
second is based on incorporating a transformation to near normality. Although
these methods have been compared in their respective papers, there has yet to
be a detailed comparison to determine when one method might be more suitable
than the other. Herein, we provide a detailed comparison on many benchmarking
datasets, as well as describe a novel method to assess cluster separation.