Estimating the number of clusters in a dataset via consensus clustering


Unlu R., Xanthopoulos P.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.125, ss.33-39, 2019 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 125
  • Basım Tarihi: 2019
  • Doi Numarası: 10.1016/j.eswa.2019.01.074
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Social Sciences Citation Index (SSCI), Scopus
  • Sayfa Sayıları: ss.33-39
  • Anahtar Kelimeler: Weighted consensus clustering, Validity indices, Number of clusters, MICROARRAY DATA, SELECTION, VALIDATION
  • Abdullah Gül Üniversitesi Adresli: Hayır

Özet

In unsupervised learning, the problem of finding the appropriate number of clusters-usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering. (C) 2019 Elsevier Ltd. All rights reserved.