Home
Scholarly Works
Finding Outliers in Gaussian Model-based...
Journal article

Finding Outliers in Gaussian Model-based Clustering

Abstract

Clustering, or unsupervised classification, is a task often plagued by outliers. Yet there is a paucity of work on handling outliers in clustering. Outlier identification algorithms tend to fall into three broad categories: outlier inclusion, outlier trimming, and post hoc outlier identification methods, with the former two often requiring pre-specification of the number of outliers. The fact that sample squared Mahalanobis distance is beta-distributed is used to derive an approximate distribution for the log-likelihoods of subset finite Gaussian mixture models. An algorithm is then proposed that removes the least plausible points according to the subset log-likelihoods, which are deemed outliers, until the subset log-likelihoods adhere to the reference distribution. This results in a trimming method, called OCLUST, that inherently estimates the number of outliers.

Authors

Clark KM; McNicholas PD

Journal

Journal of Classification, Vol. 41, No. 2, pp. 313–337

Publisher

Springer Nature

Publication Date

July 1, 2024

DOI

10.1007/s00357-024-09473-3

ISSN

0176-4268

Contact the Experts team