Robust Clustering in Regression Analysis via the Contaminated Gaussian Cluster-Weighted Model
Journal Articles
Overview
Research
Identity
Additional Document Info
View All
Overview
abstract
The Gaussian cluster-weighted model (CWM) is a mixture of regression models
with random covariates that allows for flexible clustering of a random vector
composed of response variables and covariates. In each mixture component, it
adopts a Gaussian distribution for both the covariates and the responses given
the covariates. To robustify the approach with respect to possible elliptical
heavy tailed departures from normality, due to the presence of atypical
observations, the contaminated Gaussian CWM is here introduced. In addition to
the parameters of the Gaussian CWM, each mixture component of our contaminated
CWM has a parameter controlling the proportion of outliers, one controlling the
proportion of leverage points, one specifying the degree of contamination with
respect to the response variables, and one specifying the degree of
contamination with respect to the covariates. Crucially, these parameters do
not have to be specified a priori, adding flexibility to our approach.
Furthermore, once the model is estimated and the observations are assigned to
the groups, a finer intra-group classification in typical points, outliers,
good leverage points, and bad leverage points - concepts of primary importance
in robust regression analysis - can be directly obtained. Relations with other
mixture-based contaminated models are analyzed, identifiability conditions are
provided, an expectation-conditional maximization algorithm is outlined for
parameter estimation, and various implementation and operational issues are
discussed. Properties of the estimators of the regression coefficients are
evaluated through Monte Carlo experiments and compared to the estimators from
the Gaussian CWM. A sensitivity study is also conducted based on a real data
set.