Automatic determination of
categories in unsupervised classification Sunpyo Hong, Kiyonari
Fukue, Haruhisa Shimoda. Toshibumi Sakata Tokai University Research and Information Center 2.28-4 Tomigaya, Shibuya – ku. Tokyon 151, Japan Abstract A cluster categorization method is necessary when an unsupervised classification is used for remote sensing image classification. It is desirable that this method is performed automatically, because manual categorization is a highly time consuming process. In this paper, several automatic determination methods were proposed and evaluated. They ar4e 1) maximum number method. which assigns the target cluster to the category which occupies the largest area of that cluster: 2) maximum percentage method, which assigns the target cluster to the category which shows the maximum percentage within the category in that cluster : 3) minimum distance method, which assigns the target cluster to the category having minimum distance with that cluster. From the results of experiments, it was certified that the result by the minimum distance method was almost the same as the result made by a human operator. Introduction With the launch of second generation high resolution sensors like LANDSAT TM and SPOT HRV. clustering method has been revaluated recently. However, the main problem of clustering for practical use is that clustering is an unsupervised classification. That is , clusters generated by clustering are defined in feature vector space, not in image data. Therefore, in order to use that classified result for a; meaningful reference map, it is necessary to determine the relation of clusters and categories, and to label the classified result with the categories. Conventionally, this relation have been determined mainly by interpretation of an operator. However, this process is time consuming and is not objective. The purpose of this research is to try several methods of automatic cauterization and find and find out the most useful method. In this paper, 3 methods have been examined. Problems of conventional Method In this method, each classified cluster is overlaid with the target image data on the display, and that cluster is interpreted by an operator to determine the category, and that cluster is interpreted by that the obtained result is natural and reliable. However, since everything is determined by an operator in this method, there many problems as follows.
To solve the above problems, several automatic categorization methods are considered as follows. In all methods, training areas are first extracted from the image similar to supervised trainings. 1- Maximum Number Method In this method, the number of pixels in each category for each cluster is calculated, Then the category having the maximum number is assigned to that cluster. 2- Maximum Percentage of Category Method. In this method, for each cluster, the percentage ( occupation rate ) of that cluster in each category is calculated. Then the category having the maximum percentage is assigned to that cluster, Fig. 1 shows a comparison of these two methods in a simple case. Suppose that cluster k is compose of three categories A, B and C. As shown in Fig. 1 (a) , category A occupies ;the largest area in class k and C occupies the minimum area. In the maximum number method, cluster is always assigned to category A. However, this figure does not show the difference of areas of each category . Fig. 1 (b) shows the case the difference of areas of each category . Fig. 1 (b) shows the case that the total area of each category is the same and (c) shows the case that the total area of each category is different. As shown from this figure, categories which occupy small areas in the image tends to be neglected in the maximum number method. On the contrary, small area categories are tread favorably in the maximum percentage method as shown in Fig. 1. Fig. 1 Cluster and Categories 3. Minimum Distance Method In this method, the distance between each category and each cluster is calculated. Then the category having the minimum distance is assigned to that cluster. Euclidian distance was used in this experiment. In the case of the maximum number method and the maxima, percentage method, the result is dependent upon the size and location of training areas. In the minimum distance method, training areas selection is easier than other 2 methods, because the geometrical information of the training area is not used. Experiments and Results 1 Flow of Experiments In order to evaluate the proposed methods described in chapter 3, following LANDSAT TM data was used in the experiment. At first, clusters were generated by a hierchical cluster mint using Ward method. Since the image data in remote sensing is very large, usually clustering is performed with sampled data. In this experiment, 2500 samples (about 10% of entire image data) were used to generate 66 clusters. Based on the 66 clusters, the target image data was classified by a maximum likelihood method. Secondly, representative area of each category in the target image (training area) was selected. 14 categories were selected as shown in Table 1. Finally, the relations of clusters with categories were determined by 3 methods described in chapter 3.
To evaluate quantitatively. classification accuracy ws estimated basd on the test site data in target image. the classification accuracy was calculated over 5 major categories as shown in table 1 to adjust the selected categories in target image and the test site categories. 2. Results of Experiments
Fig. 5 Result by Minimum Distance Method Conclusions
|