Model-based clustering of microarray expression...

Model-based clustering of microarray expression data via latent Gaussian mixture models

Abstract

MOTIVATION: In recent years, work has been carried out on clustering gene expression microarray data. Some approaches are developed from an algorithmic viewpoint whereas others are developed via the application of mixture models. In this article, a family of eight mixture models which utilizes the factor analysis covariance structure is extended to 12 models and applied to gene expression microarray data. This modelling approach builds on previous work by introducing a modified factor analysis covariance structure, leading to a family of 12 mixture models, including parsimonious models. This family of models allows for the modelling of the correlation between gene expression levels even when the number of samples is small. Parameter estimation is carried out using a variant of the expectation-maximization algorithm and model selection is achieved using the Bayesian information criterion. This expanded family of Gaussian mixture models, known as the expanded parsimonious Gaussian mixture model (EPGMM) family, is then applied to two well-known gene expression data sets. RESULTS: The performance of the EPGMM family of models is quantified using the adjusted Rand index. This family of models gives very good performance, relative to existing popular clustering techniques, when applied to real gene expression microarray data. AVAILABILITY: The reduced, preprocessed data that were analysed are available at www.paulmcnicholas.info

Authors

McNicholas PD; Murphy TB

Journal

Bioinformatics, Vol. 26, No. 21, pp. 2705–2712

Publisher

Oxford University Press (OUP)

Publication Date

November 1, 2010

DOI

10.1093/bioinformatics/btq498

ISSN

1367-4803

Associated Experts

Paul McNicholas

Professor, Faculty of Science

Visit profile

Labels