Estimating the number of clusters in a dataset via consensus clustering


Unlu R., Xanthopoulos P.

EXPERT SYSTEMS WITH APPLICATIONS, vol.125, pp.33-39, 2019 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 125
  • Publication Date: 2019
  • Doi Number: 10.1016/j.eswa.2019.01.074
  • Journal Name: EXPERT SYSTEMS WITH APPLICATIONS
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Page Numbers: pp.33-39
  • Keywords: Weighted consensus clustering, Validity indices, Number of clusters, MICROARRAY DATA, SELECTION, VALIDATION
  • Abdullah Gül University Affiliated: No

Abstract

In unsupervised learning, the problem of finding the appropriate number of clusters-usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering. (C) 2019 Elsevier Ltd. All rights reserved.