Advances in fuzzy clustering and its applications pdf
All these approaches use the efficient graph partitioning algorithm METIS by Karypis and Kumar to partition graphs induced by the cluster ensemble and find the consensus clustering. Note that there is implicitly an additional constraint in these solutions, namely that the consensus clusters obtained be of comparable size. We describe these and other algorithms in the following subsections.
The intuition is that the more similar two data-points are the higher is the chance that constituent clusterings will place them in the same cluster. This similarity matrix graph can be clustered using any reasonable pair wise similarity based clustering algorithm to obtain the final clustering. Hence, it is more expensive in terms of resources than algorithms that will be introduced next.
Hyper-graph partitioning seeks a to cluster the data by eliminating the minimal number of hyper-edges. In the HGPA algorithm all the vertices and hyper-edges are weighted equally. Hence, we will not discuss this algorithm or report any results for it in the remainder of this chapter.
First, it tries to solve the cluster correspondence problem and then uses voting to place data-points into the final consensus clusters. The cluster correspon- dence problem is solved by grouping the clusters identified in the individual clusterings of the ensemble. As we have seen earlier, the matrix H represents each cluster as n-length binary vectors. This similarity matrix graph , with clusters as nodes, is partitioned into meta-clusters using METIS. The final clustering of instances is produced in the following fashion.
All the clusters in each meta-cluster are collapsed to yield a association vector for the meta-cluster. The instance is then clustered into the meta-cluster that it is most associated to.
The cluster similarity matrix can be computed in time quadratic in the number of clusters in the ensemble. This is often significantly less than n2.
Furthermore, the averaging and voting operations are linear in n. This makes MCLA computationally very efficient. The CSPA algorithm models the ensemble as a graph with the vertices representing instances in the data, while the MCLA algorithm models the ensemble as a graph of clusters.
The HBGF technique combines these two ideas and repre- sents the ensemble by a bipartite graph in which the individual data points and the clusters of the constituent clusterings are both vertices. The graph is bipartite because there are no edges between vertices that are both either instances or clusters.
The clusters in the consensus clustering contain both instances and the original clusters. Hence, the method yields a co-clustering solution.
This method has also been previously used to simultaneously cluster words and documents by Dhillon While this is significantly less than quadratic in the number of instances as in CSPA , in practice we observe the algorithm to be fairly resource hungry both in terms of CPU time and storage. The framework uses a K-Means type algo- rithm to produce several clusterings each with a random initialization. The number of clusters specified in each KMeans clustering is typically much larger than the true number of clusters desired.
The data instances are then mapped into the similarity space where the similarity between two instances i and j is the fraction of clusterings in which they ended up in the same cluster.
A Minimum Spanning-Tree based clustering algorithm is then used to obtain the final clustering. In practice any appropriate clustering technique could be employed. This framework and the consensus function that it uses are very similar to the Cluster Ensemble framework and the CSPA algorithm Strehl and Ghosh The actual consensus function used in this algorithm only works on heavily restricted type of ensembles; each constituent clustering has the same number of clusters.
Also, Fern and Brodley extended this approach to accept soft clusterings as input. The details of this approach are presented in Subsection 1. The EM procedure along with the parameters provides us with a soft final clustering. In this approach, it is assumed that the ensemble has been generated from a mixture of multi-dimensional multinomial distributions. Each data point is generated by first picking a multinomial distribution according to the priors. After picking a component of the mixture the cluster label in each clustering is picked from a multinomial distribution over the cluster labels.
The cluster labels of different constituent clusterings are assumed to be i. The number of parameters to be estimated increases with both the number of constituent clusterings as well as with the number of clusters in them. Experiments in Topchy et al.
In this chapter we will evaluate the performance of this consensus function on more complex real-life datasets. One advantage of this approach is that it is easy to model final clusters of different sizes using this method.
Graph partitioning methods tend to yield roughly balanced clusters. This is a disadvantage in situations where the data distribution is not uniform. Using the priors in the mixture model the distribution of data can be accommodated conveniently. Here, we recount some research on the impact of diversity on cluster ensembles.
Ghosh et al. Fern and Brodley report on some experiments on diversity of ensembles. Kuncheva and Hadjitodorov study the diversity of ensembles using multiple measures like the Rand Index, Jaccard measure etc. Based on this study they propose a variant of the Evidence Accumulation frame- work where the number of over-produced clusters is randomly chosen. This randomization in ensemble generation is shown to increase the diversity of the ensembles thereby leading to better consensus clustering.
In a recent follow up work Hadjitodorov et al. In order to objectively evaluate our new approach, will describe changes to existing techniques mentioned in Section 1. This is the hard labeling defined in Subsection 1. A soft cluster ensemble is shown in Table 1.
Instead of hardening the posterior probabilities into cluster labels we construct a matrix S q representing the solution of the q th soft clustering algorithm. S q has a column for each q cluster generated in the clustering and the rows denote the instances of data with Sij being the probability of xi belonging to cluster j of the q th clustering.
Hence, the values in each row of S q sum up to 1. There are r such clusterings S 1, In this chapter we only work with algorithms that output hard final clusterings. But, it is not at all obvious that this additional information is useful. The goal of this study is to show empirically that algorithms designed for soft ensembles improve upon the accuracy of those that operate on the hardened versions of the ensembles.
Here, we will try to intuitively explain why we expect this. For the sake of discussion consider a cluster ensemble where individual clusterings are working on vertically partitioned data. In such a scenario, the underlying clustering algo- rithms have access to different and often incomplete sets of features. Incomplete data could result from distributed computing constraints Ghosh et al.
Under such circumstances there is an increased chance that the underlying clustering algorithms will not be able to assign some objects into clusters with much certainty. If the combining procedure were to accept only hard clusterings, these objects would have to be assigned to the cluster they most belong to one with the highest posterior probability.
Consider the soft ensemble depicted in Table 1. The solution S 2 assigns x3 to clusters s4 ,s5 , and s6 with probabilities 0. The com- nd bining algorithm would have no evidence that the 2 underlying clustering algorithm was unsure about the assignment of x3. It would accept this observation with the same amount of certainty as any other observations that assigns a data-point xi to a cluster sj with 0. If, however, the combining function were to accept soft clusterings, it could potentially use this information to make appropriate cluster assignment of x3 in the combined cluster- ing.
This algorithm is very similar to the KMeans algorithm, differing only in the fact that as a measure of distance it uses the KL-divergence Kullback and Leibler instead of the Euclidean dis- tance.
The reader is referred to the original paper for more details. Here we just describe the mapping of the soft cluster ensemble problem to the information-theoretic K-Means problem. Each instance in a soft ensemble is represented by a concatenation of r posterior member- ship probability distributions obtained from the constituent clustering algorithms see matrix S in Table 1.
Here we note that this is equivalent to computing the KL divergence between instances represented by a matrix S in which each row sums upto one. This normalization can be performed by multiplying each q value in S q by Prw w q. Computing Equation 1. We can, however, imagine a scenario where we have different importance val- ues for the constituent clusterings.
These values could, for instance, be our confidence in the accuracy of these clusterings, possibly based on the number of features they access. These confidence values can be easily integrated into the cost function using the weights w q. This algorithm is described in Section 1. Thus the technique first transforms the objects into a label- space and then interprets the dot product between the vectors representing the objects as their similarity.
In our experiments we use Euclidean distance in the label space to obtain our similarity measure. The dot product is highly co-related with the Euclidean measure, but Euclidean distance provides for cleaner semantics. Another distance measure can be defined on the instances in a soft ensemble using KL- divergence Kullback and Leibler as in Section 1. In our results we observed that all versions of the sCSPA with Euclidean distance, KL divergence, and cosine similarity gave very similar results.
The results obtained while using Euclidean distance were sometimes better, so here we will report results based on only that version of the sCSPA. Fern and Brodley proposed a variant of the Evidence Accumulation framework that accepts soft clusterings.
In this scenario, the similarity of two instances is calculated as the average dot product of the probability distributions describing them. The input matrix used by this framework is essentially equivalent to the one used by sCSPA using Cosine similarity. The only difference is in the combining function.
Hence, we will not experiment with this technique further in this chapter. The idea is to group and collapse related clusters into meta-clusters, and then assign each object to the meta-cluster in which it belongs most strongly. The clusters are grouped by graph partitioning based clustering. The Euclidean distance is a measure of the difference of membership of all objects to these two clusters. Since each vertex in the meta- graph represents a distinct cluster label, a meta-cluster represents a group of corresponding cluster labels.
Collapse Meta-Clusters using Weighting: We now collapse all the clusters contained in each meta-cluster to form its association vector. This association vector is computed as the mean of the association vectors for each cluster that is grouped into the meta-cluster.
This is a weighted form of the step performed in MCLA. Compete for Objects: Each object is assigned to the meta-cluster to which it is most asso- ciated. There is, however, one problem with this approach. Because we are using soft clusterings as inputs, the co-association graph of the clusters meta-graph is almost complete. More specifically, even clusters from the same clusterings have non-zero similarity to each other.
This is not the case with MCLA since it uses a binary Jaccard measure, and for hard cluster- ings Jaccard similarity between clusters in the same clusterings is necessarily zero. Hence, sMCLA forces the similarity of hyper-edges coming from the same clustering to be zero. This is, however, only done when the number of clusters in all the constituent clusterings is equal to the desired final number of clusters.
In ensembles where the number of clusters in each underlying clustering vary the algorithm does not force the co-association matrix to be r-partite. This approach can be trivially adapted to consider soft ensembles since the graph partitioning algorithm METIS accepts weights on the edges of the graph to be partitioned. In this section we describe the experimental setup in detail. Some basic properties of these datasets are summarized in Table 1. These datasets were selected so as to present our algorithms with problems of varying degrees of difficulty — in terms of number of desired clusters, number of attributes, and number of instances.
It was generated from 5 multivariate Gaussian distributions points each in 8-dimensional space. The clusters all have the same variance but different means. The means were drawn from a uniform distribution within the unit hypercube.
We removed some nominal features which corresponded to the context like sex, name etc, and only retained the 10 real valued features. There are 11 classes in the data and an average of 93 instances per class. It contains 16 spatial features for each of the instances. In order to get better clustering results, we normalized the columns features to sum to 1. Real-valued features corresponding to their chemical and optical properties describe the instances.
There are instances categorized into 6 classes such as tableware, containers etc based on 9 attributes. Each data-point is described by a set of 30 Hyper- Spectral signatures pruned from an initial set of features. The pruning was per- formed by a best-basis feature extraction procedure Kumar et al. The dataset has 13 classes describing the geographical features apparent in the pixel. This is a fairly hard prob- lem, and this shows in the clustering results we obtain.
The instances are each characterized by 8 attributes, and there are 10 classes in the dataset. The vehicle may be viewed from one of many different angles. The silhouette instances are classified into 4 vehicle categories Opel, Saab, Bus, and Van.
Note here that for each soft cluster ensemble we also stored its corresponding hardened version to evaluate methods that only accept hard clusterings. The individual clusterings in our ensembles were created using the EM algorithm Dempster et al. This partial view of the data as well as the depen- dence of EM on initialization resulted in the diversity in the individual clustering solutions in an ensemble.
As mentioned above, we wanted to evaluate our algorithms on ensembles of varying degrees of difficulty. For this purpose we created ensembles by varying two parameters that controlled the degree of difficulty. The first parameter is the number of attributes that the EM algorithm accesses while creating the constituent clusterings. We expect the difficulty of an ensemble containing clusterings created from less attributes to be higher.
The second parameter is the number of constituent clusterings in the ensemble. In general, we expect that as the number of constituent clusterings increase the consensus clusterings obtained should be more accurate. For most datasets the number of clusterings in the ensembles is varied from 2 to 10, and in some cases to The entire set of options for all the datasets is listed in Table 1. The second column in the table describes the different settings for number of features used to create clusterings.
For instance, for the 8D5K dataset we can obtain ensembles with constituent clusterings created using 3,4,5, or 6 attributes. Also, for each such setting we can select from 10 clusterings to form an ensemble. Of course, each of these 10 clusterings is created with a randomly selected set of attributes. Hence, while creating an ensemble we specify three parameters: the dataset name, the number of attributes, and the number of clusterings.
For each set of parameter values, we create multiple ensembles by randomly selecting the clusterings to combine. Also, non- deterministic consensus functions are run multiple times in order to average out variations in results due to initialization. Here we must note that each individual clustering as well as the consensus function is given the true number of clusters to find. The use of ensembles for finding the true number of clusters, or the effect of different k in constituent clusterings on ensemble accuracy are not investigated in this study.
Both these criteria compare the obtained clustering to the true labels of the instances. We also use the Geometric Mean Ratio to present an overall score for the performance of each algorithm. In our evaluation, X will be consensus clustering while Y will be the true labels. With these properties NMI is extensively used for evaluating clustering algorithms in literature.
The Adjusted RAND compares two labellings based on whether pairs of objects are placed in the same or different clusters in them. The maximum value it takes is 1, and its expected value is 0. Hence we will only report the NMI score in the chapter. The CVC is calculated by the following procedure. The CVC measure weighs the contribution of a cluster to the average by its size.
There are other issues with this measure, however. The CVC measure is biased towards solutions with large number of very small pure clusters. Chaudhuri, D. CAS Google Scholar. Postairs, J. Kohonen, T. Grossberg, S. Lei, Xu, Krrzyzak, A.
Zhang, D. Carpenter, C. Simpson, P. Asultan, K. Rose, K. Buckles, B. Babu, G. Al-Sultan, K. Windham, M. Cybernet, , 4: Gunderson, R.
Xie, X. Vogel, M. Jain, A. Beni, C. Huntsbergery, T. Ramdas, V. Jolion, J. Wu Youshou, Ding Xiaoqing, A new clustering method for Chinese character recognition system using artificial neural networks, Chinese J.
Huang, Z. Stewart, C. Coleman, G. IEEE, , 5 67 : Liu Jianzhuang, 2-D based image segmentation method with fuzzy clustering, Acta Electronca Sinica, , 20 9 : Trivedi, M. Porter, R. Chaudhuri, B. Chen, S. Shih, E. Wai-Chi Lai et al. Download references. President office, Shenzhen University, , Shenzhen, China. You can also search for this author in PubMed Google Scholar.
Correspondence to Xinbo Gao. Gao, X. Advances in theory and applications of fuzzy clustering. Download citation. Received : 23 July Accepted : 24 December Issue Date : June Anyone you share the following link with will be able to read this content:.
Sorry, a shareable link is not currently available for this article. Provided by the Springer Nature SharedIt content-sharing initiative. Skip to main content. Search SpringerLink Search. Abstract The summarization and evaluation of the advances in fuzzy clustering theory are made in the aspects including the criterion functions, algorithm implementations, validity measurements and applications.
References 1. Google Scholar 2. Google Scholar 3. Article Google Scholar 4.
0コメント