Flexible High-Dimensional Unsupervised Learning with Missing Data
Abstract
The mixture of factor analyzers (MFA) model is a famous mixture model-based
approach for unsupervised learning with high-dimensional data. It can be
useful, inter alia, in situations where the data dimensionality far exceeds the
number of observations. In recent years, the MFA model has been extended to
non-Gaussian mixtures to account for clusters with heavier tail weight and/or
asymmetry. The generalized hyperbolic factor analyzers (MGHFA) model is one
such extension, which leads to a flexible modelling paradigm that accounts for
both heavier tail weight and cluster asymmetry. In many practical applications,
the occurrence of missing values often complicates data analyses. A
generalization of the MGHFA is presented to accommodate missing values. Under a
missing-at-random mechanism, we develop a computationally efficient alternating
expectation conditional maximization algorithm for parameter estimation of the
MGHFA model with different patterns of missing values. The imputation of
missing values under an incomplete-data structure of MGHFA is also
investigated. The performance of our proposed methodology is illustrated
through the analysis of simulated and real data.