# 层次聚类分析外部准则的可比性研究.pdf

Full Terms 10 20 30 40 50 Number of Clusters 1.0 - :I, FOWLKES AND MALLOWS .BO . .80. :: I HUBERT AND ARABIE .:. RAND . ; .: Number of Clusters JACCARU Number of Clusters Number of Clusters Figure 2. Mean criterion value plots for the 10% density condition. difference between this index and the Rand is the presence of the additive term d in both the numerator and denominator. It would appear that the cause of the pattern for the Rand index is due to the use of the term d (see Table 1). Thus, the relative sizes of the clusters being formed have a differential impact on the value of the term d across hierarchy levels. This is not a desirable result and it confounds the problem of comparability of values across levels. 448 MULTIVARIATE BEHAVIORAL RESEARCH Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C,, Cooper RAND &# .30 .10 O.O0 201-,-,-- 10 20 30 40 50 Number of Clusters Number of Clusters Number of Clusters Figure 3. Mean criterion value plots for the 60% density condition. 1.0 - .so . .80 . .70 .80 . .%Of . .40 .30 . .20 . .lo. 0.06 1.0 - .so .80 The average values for the Jaccard and the Fowlkes and Mallows measures do not change dramatically from one density condition to the next. As such, they are not as affected by the presence of unequal cluster sizes as ,the Rand index. The only clear impact for these measures seems to be for variability when relatively few clusteirs are present in the hierarchical solution, especially for the 10% density condition (Figure 2). HUBERT AND ARABIE . . 10 20 30 ~40 50 FOWLKES AND MALLOWS OCTOBER 1986 449 Number of Clusters 1.0 - .so .80 . .7o . .80 JACCARD Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C. Cooper The Hubert and Arabie adjusted Rand index produced remarkably consistent patterns for the mean across all three density levels. In every case, the mean value was plotted on or very near the zero axis line. The variability of the criterion was smaller than that found for all other indices. The index exhibited somewhat greater variability when few clusters were present in the hierarchical solution as opposed to that found at higher levels. The upper limits on variability indicated that index values as large as .04 would be seen rarely in practice if the match between the clustering and criterion solutions was due to chance in the equal density condition. The corresponding upper limits were .10 and .09 for the 10% and 60% density conditions, respectively. Thus, unequal size clusters have no effect on the mean index value and only mild effects on the variance. Although not shown in Figures 1-3, the results for the Morey and Agresti adjusted Rand index were similar. However, the mean values were consistently larger than 0.0 with typical averages near .05. The variability of the index tended to be larger and the pattern somewhat more homogeneous across hierarchy levels. The maximum value which would be seen for the Morey and Agresti index under random match conditions would be approximately .18. Methods. The results for each of the four clustering methods produced plots generally similar to those given in Figure 2 for the 10% density condition. The most striking result found was the variability pattern of the Rand index. For all four methods, the index exhibited the greatest overall variability. However, the variability was not constant across the levels in the hierarchical solution. A restriction in range occurred at specific locations in the hierarchy depending on the clustering method. For the single link technique, this occurred at the level of 14 to 15 clusters and for the group average method near the level of 6 to 7 clusters. For the methods which tend to restrict the variance of the constructed clusters, the complete link and Ward s technique, the minimum occurred at the level of 2 or 3 clusters. Again, given the absence of such observed patterns for the Jaccard statistic, the results must be due to the use of the term d in the Rand index. The patterns for the Jaccard and the Fowlkes and Mallows indices were similar to each other except that the mean value of the Jaccard measure was consistently lower. Both indices produced maximum values at the level of two clusters and decreased monotonically as the level increased. There was some degree of similarity of mean patterns for these indices for the plots for the complete link, group average, and Ward s method. The pattern was somewhat different for the single link 450 MULTIVARIATE BEHAVIORAL RESEARCH Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C. Cooper method where the mean response was close to a linear function when few clusters were present. In general, the variability of both statistics decreased as the number of clusters in the hierarchical ~~olution increased. The magnitude of the variability appeared to be somewhat method dependent and was greater for the single link ancl group average techniques. On the other hand, both adjusted Rand indices displayed the same mean patterns as found for the density factor. The variability of the criteria was smaller than that found for the other three indices. The mean response for the Morey and Agresti index was slightly above 0.0 when few clusters were present in the hierarchical solution and increased to a mean no higher than about .07 as the number of clusters increased to 50. The only method dependent pattern that could be found was for the single link algorithm where the increase in the mean to .07 was a bit slower than that for the other methods. Overall, the results suggested that one would not expect to find a value for the Morey and Agresti index greater than about .15 if the match between the criterion and obtained partitions was due to chance. ]?or the Hubert and Arabie adjusted Rand statistic, the upper variability limit was .07. The ]Hubert and Arabie index consistently produced mean values closer to 0.0 than did the Moriey and Agresti measure. It is useful to compare the variability found for the Rand and the Fowlkes and Mallows measures in Figures 1-3 to that indicated in Figure 3 of the Fowlkes and Mallows article (1983, p. 55,8). The variability suggested in their paper substantially underestimates the actual variability found for these two indices in the present study. This could have serious implications for the application of the indices in actual test clusterings. Dimensionality factor. The effect of the number of underlying dimensions was examined in the present study. It was found that the results were highly consistent across the 4, 6, and 8 dimensional data sets. The results for all three dimensionality levels produceid plots similar to Figure 1, the equal density condition. As such, the dimen- sionality of the data seems to have little if any effect on the external criteria. Again, a relatively flat response curve was found with the adjusted Rand indices. Number of clusters. The results for the number of hypo1,hetical clusters specified by the criterion partition indicated that the case involving two clusters was different from those which used 3, 4, or 5 clusters. In the two cluster case, the Rand index displayed a ]pattern which is the reverse of that typicalljr found. That is, the largest OCTOBER 1986 45 1 Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C. Cooper average values were found when few clusters were present in the algorithm solution. It seems likely that the term d, which is minimized at the level of two clusters, is responsible for this effect. The index reverted to a more typical pattern when 3, 4, or 5 clusters were specified. The patterns obtained for the Jaccard and the Fowlkes and Mallows measures were consistent with those found for the other design factors. It is interesting to note that rather large values can be obtained for the Fowlkes and Mallows measure. Values in the range of .70 to .90 occurred frequently at the level of two clusters in the hierarchical solution. However, these values occurred with data where there was only a random match between the criterion and the obtained hierarchical solution. If one were comparing values across hierarchy levels, the bias of the statistic would cause one to select a solution with few partitions present even though no significant structure existed in the data. Furthermore, the magnitude of the values might cause one to believe that significant recovery had occurred. Similar comments hold for the Jaccard index in the range .50 to 30. Given the unacceptable patterns for the Rand, Fowlkes and Mallows, and Jaccard indices, only the two versions of the adjusted Rand index are presented in Figure 4. Since there was a fair degree of similarity among the results for 3, 4, and 5 clusters, only the 2 and 4 cluster cases are presented in Figure 4. The adjusted Rand indices produced fairly flat although not identical response curves (see Figure 4). There was some increase in variability for the indices when few clusters were present in the hierarchical solution, followed by a reduction in variation with a larger number of clusters. The Morey and Agresti measure did tend to produce slightly greater variability. Further, the average values for the index were greater than zero. In fact, for the four cluster data, the lower two standard deviation limit was well above the 0.0 axis line. This is coosistent with the Hubert and Arabie (1985) results which indicated that the Morey and Agresti index produces an incomplete adjustment and leaves a positive bias. On the other hand, the Hubert and Arabie index produced a more desirable pattern with all means very close to 0.0. Expanded sample size. As mentioned in the methodology section, analyses were conducted with 540 random noise data sets consisting of 100 points each, as opposed to 50 items per set. The results for the density, dimensionality, and number of clusters factors were quite consistent with those based on data sets with 50 points each. The Raqd 452 MULTIVARIATE BEHAVIORAL RESEARCH Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C:. Cooper HUBERT AND ARABIE MOREY AND AGREiSTI .so i O I Number of Clusters Number of Clust~!rs HUBERT AND ARABIE 1 MOREY AND AGREiSTI Number of Clusters Number of Clusters Figure 4. Mean value plots for data sets hypothesized to possess two and four clusters, first and second rows respectively, for the two versions of the adjusted Rand index. index displayed effectively no change in the mean response rate or in the standard deviation limits. The patterns for the Jaccard and the Fowlkes and h!lallows indices were unchanged, except for a slight reduction in mean response values. The adjusted Rand indices pro- duced the same flat response curves as seen with the 50 point data sets. Compared to the results with the smaller data sets, the mean response OCTOBER 1986 453 Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C. Cooper moved even closer to 0.0 with a reduction in the variance at all levels in the hierarchy. Values greater than about .10 would not be obtained for the Morey and Agresti index if the match between the criterion and the obtained partition was due to chance. Similarly, for the Hubert and Arabie procedure, the upper limit would appear to be .05. Distinct Cluster Case The results for the recovery of cluster structure when distinct clusters actually were present in the data are presented in Figure 5. Since the Hubert and Arabie Rand index appeared to be the best measure in the null case results, the presentation is limited to this criterion measure. Results for data sets containing 2, 3, 4, and 5 nonoverlapping clusters are presented separately in the figure. Given the rather strong clustering present in the data, the criterion value should indicate near perfect recovery of the structure by the methods at the correct level in the hierarchical solution. Thus, mean values approaching 1.00 were expected at these levels. Furthermore, a sub- stantial drop in mean value should be seen in the plots for the 3,4, and 5 cluster data sets after the correct level has been passed and too few clusters are present in the hierarchical solution. Clearly, such patterns are seen in the plots in Figure 5 for the Hubert and Arabie adjusted Rand index. Further analysis of the behavior of the index with the structured data showed no difficulties across methods, density levels, or the number of dimensions. A final application based on the results of Figure 5 has been noted by a reviewer. The graph presented in the figure might help one determine the number of clusters present in the data set. Rather than using the index as an external validation measure, the two separate partition sets could be obtained from two different clusterings of the same data. Such replications of cluster solutions might offer insight into the structure of a real life data set. Discussion The results of the present paper are consistent with those of Fowlkes and Mallows (1983) and Milligan and Schilling (1985). In general, all three studies found that the mean Rand index value increased as the number of clusters in the hierarchical solution increased. Similarly, the Fowlkes and Mallows index decreased as the number increased. However, the trend is not nearly as pronounced for 454 MULTIVARIATE BEHAVIORAL RESEARCH Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha rS. Cooper Number of Clusters Number of Clusters Number of Clusters Number of Clusters Figure 5. Mean Hubert and Arabie adjusted Rand value plots for 2, 3, 4, and 5 cluster data sets. the Rand index in the present study as in the Fowlkes and lvlallows paper. In particular, at the level of 50 clusters in the Fowllres and Mallows study, the average Rand value was in excess of .9 whereas in the present study the mean was approximately .6. Of course, the null case in the Fowlkes and Mallows paper is different from the one used here. However, the null case used in the present paper is more OCTOBER 1986 455 Downloaded by [Seoul National University] at 02:06 20 July 2016 Glenn W. Milligan and Martha C. Cooper appropriate for the typical clustering validation study where one has a fixed number of clusters in the criterion solution. Neither the Rand nor the Fowlkes and Mallows index exhibited desirable patterns for the null case data. In all but one case, the Rand index produced its lowest average value at the point indicated by the criterion solution. Of course, one could argue that the index was indicating that the solution was particularly bad. However, the bias with the Rand index would be to select a solution with a large number of clusters. The only exception occurred when two clusters were hypothesized to be present in the data set. In this case, the bias was reversed. On the