New Cluster Detection using Semi-supervised Clustering Ensemble Method

In the recent years there has been tremendous development of da a acquisition system resulting in a whole new set of so called big data problems. Since these data struc tures are inherently dynamic and constantly changing the number of clusters is usually unknown. Further more the ”true” number of clusters can depend on the constraints and/or perception (biases) set by expert s, users, customers, etc., which can also change. In this paper we propose a new cluster detection algorithm ba sed on a semi-supervised clustering ensemble method. Information fusion techniques have been widely app lied in many applications including clustering, classification, detection, etc. Although clustering is uns upervised and it does not require any training data, in many applications, expert opinions are usually availabl e to label a portion of data observations. These labels can be viewed as the guidance information to combine t he cluster labels that are generated by different local clusters. It consists of two major steps: the base clus tering generation and the fusion. Since the step of generating base clusterings is unsupervised and the step of combining base clusterings is supervised, in the context of this paper, we name the algorithm as the semi-supe rvised clustering ensemble algorithm. We then propose to detect a new cluster utilizing the average associ ati n vector computed for each data point by the semi-supervised method.