Research Article

Development of Various Validity Indices for Fuzzy K-Means Cluster Analysis

Lee, Suhyeon¹ · Kim, Jaeyun¹ · Jung, Yeongseon¹

¹ Chonnam National University

Published: August 2017 · Vol. 46 No. 4 · pp. 1201-1226

DOI: https://doi.org/10.17287/kmr.2017.46.4.1201

Full Text

Abstract

Cluster analysis (or Clustering) is used in many different fields such as finance, marketing, and operations management to draw homogeneous cases. Due to that reason, the result extracted from cluster analysis is stated to be the core element to maximize the firm's value. Because the number of clusters in clustering problems is usually unknown, it is significant to evaluate the clustering results produced by different parameter settings. After a range of possible number of clusters are evaluated, the best partition is selected based on the cluster validity analysis. Cluster validity index (CVI) is an indicator to provide a way of validating the quality of clustering algorithms and determine the correct number of clusters in datasets. A CVI is composed of the summation or ratio of compactness and separability measures in which compactness indicates the concentration of data in each cluster and separability refers to the inter-cluster distances. A good clustering result will have smaller compactness and larger separability values. This research will cover the theoretical research of CVI to verify the effectiveness of Fuzzy K-means clustering results among the analytical research methods. Depending on the different combination of compactness and separability measures, several CVIs have been developed. The CVIs calculated by the ratio of compactness to separability or vice versa such as Dunn index, DB index, and XB index were proposed, and the weighted sum of these two measurements was developed as SD index and S_Dbw index. In addition, several variants of conventional CVIs have been recently proposed. However, most of existing CVIs are sensitive to arbitrary shapes of clusters, sub- clusters, and outliers because the measure of compactness of those clusters is not obvious in the original domain. We suggest new CVIs by calculating the concept of Support Vector Data Description (SVDD) in each particular cluster calculation of CVI by separating the compactness and separability about some indices well known to prove effectiveness: Dunn (DU), Calinski and Harabasz (CH), and Davies-Bouldin (DB). By conducting efficiency comparisons utilizing Fuzzy K-means clustering algorithm and various benchmarking instances, the performance rate of new CVIs has been verified with outstanding performance. The performance of noise, skewed, sub-cluster, and arbitral shapes data in the new CVIs is promising in particular. The concept of SVDD has been applied to the compactness by this research and newly created CVIs were verified to be efficient in regards to cluster effectiveness. The compactness calculation method suggested in this research is expected to be widely applied in many different CVIs. As the research of cluster analysis become more expanded and the research follows the step of diversity, this research is expected to contribute the application scope of SVDD and the expansions of both cluster analysis and the concept of CVI.

Keywords: 군집분석유효성 지수서포트 벡터 데이터 표현응집도퍼지 K-평균