abstract
- Clustering analysis is a widely used technique in bioinformatics and biochemistry for variety of applications such as detection of new cell types, evaluation of drug response, etc. Since different applications and cells may require different clustering algorithms combining multiple clustering results into a consensus clustering using distributed clustering is a popular and efficient method to improve the quality of clustering analysis. Currently existing solutions are commonly based on supervised techniques which do not require any a priori knowledge. However in certain cases, a priori information on particular labelings may be available a priori. In these cases it is expected that performance improvement can be achieved by utilizing this prior information. To this purpose in this paper, we propose two semi-supervised distributed clustering algorithms and evaluate their performance for different base clusterings.