GISdevelopment.net ---> AARS ---> ACRS 1989 ---> Land Use

Land cover classification by principal component histogram method

Katsunori Furuya
Graduate school of Science and Technology

M. Akasaka, R. Tateishi
Remote Sensing and image Research Center

H. Ishii
Faculty of Horticulture Chiba University
1-33 Yayoi-cho, Chiba city, Chiba, 260 Japan


Abstract
Mainly used land cover classification method is the maximum likelihood method, which assumes multi-dimensional normal distribution. But this assumption is not true sometimes. In terms of this point, we propose a new classification method, the PC (principal component) histogram method. The PC Histogram method assumes three-dimensional distributions in principal components space, which are defined by three-dimensional PC histogram of ground truth data. Therefore PC histogram method is adaptable for various types of distribution of ground truth data. By the evaluation of PC Histogram method using TM data, the following results were obtained. (1) PC Histogram method has better classification accuracy them maximum likelihood method for some categories. (2) CPU time for classification does not increase in proportion to the increase of the classified pixels.

Analyzed data and study area.

Landsat TM data
TM data is shown in Table 1.

Table.1 Landsat TM data
Landsat TM data
Date : 1987 July 24
Path-Row : 107-35
Processing Level : Bulk Processed

Study area
Study area was located 40km northeast of Tokyo in Japan. Its width extends about 18km from north to south and about 27km from east to west. The study area contains city areas, agricultural areas, river and lake. We used five categories, forest paddy, urban, water and golf course.

Methodology
Figure 1 provides the flow processing, the details of which is described below.
  1. Geometric registration

    TM data was reiterated and resampled. The number of reiterated TM data pixels in the study area is 900 by 600 amounting to 720,000.

  2. Ground truth data

    Ground truth data were collected from an an enhanced TM image and maps. The number of ground truth data for classification is 1399 pixels. The number of check data for evaluation of results is 14971 pixels includes ground truth data for classification .


    Figure. 1 Flow of processing

  3. PC Histogram method

    1. Steps of PC Histogram method

      PC Histogram method is supervised classification method, which consists of following five steps. The flow of PC Histogram method is shows figure 2.

      Step1: Principal component analysis In order to reduce the dimension, principal component analysis is applied, and 1st, 2nd and 3rd components are used for classification.

      Step2: Three-dimensional histogram The each distribution of ground truth data in three-dimensional histogram is produced from their 1st, 2nd and 3rd principal components. Sampling interval of three principal components for production of histograms is decided by the variance of principal components and frequency of ground truth data.

      Step 3: Interpolation of histogram frequency data into continuous three dimensional grid data

      Histograms produced in the Step 2 have discontinuous frequency distribution. Frequency data in some intervals may be zero though surrounding sampling intervals have non-zero frequency data. Interpolation processing is applied in order to produce continuous frequency distribution for the next Step.

      Step4: Division of dimensional space into regions assigned to categories In order to compare distributions of all categories, frequencies of Step 3 is converted to normalized frequencies. The normalization here is the processing by which the sum of frequencies of each category becomes equal. By comparing normalized frequencies of different categories at every sampling interval, the category which has the highest normalized frequency is selected. By this processing three-dimensional space is divided into regions assigned to categories.

      Step5: Classification of image data At every pixel of TM data, the values of 1st, 2nd and 3rd principal component are calculated. Classification is performed by this value and divided dimensional space of step 4.

    2. Maximum likelihood method

      For the verification of PC histogram method, maximum likelihood method was performed with same TM data and ground truth data.
Results
Table 2 shows the results of principal component analysis. The cumulative contribution up to 3rd component is about 94%. The confusion matrix of maximum likelihood method is shown in table 4. The classification accuracies by PC Histogram are almost similar to that by maximum likelihood method is 2 times so much as PC Histogram method.

Conclusion
PC Histogram method is adaptable for various types of distribution of ground truth data. By the evaluation of PC Histogram method using TM data, the following results were obtained. (1) PC Histogram method has better classification accuracy than maximum likelihood method for some categories such as forest, water which doesn't have normal distribution in the study area. (2) CPU time for classification by PC Histogram method does not increase in proportion to the increase of the classified pixels.

Reference
  • uruya, M. Akasaka and R. Tateishi: "Land cover classification by Principal Component Histogram method", proc. Conference of JSPRS, Tokyo, pp 103-106, (1989)
Table. 2 Results of principal component analysis
  1st PC 2nd PC 3rd PC 4TH PC 5TH PC 6TH PC 7TH PC
Contributions 0.6621 0.208 0.0684 0.0360 0.0106 0.0083 0.0046
Cumulative
Contributions 0.6621 0.8719 0.9404 0.9764 0.9870 0.9953 1.0000
Eigen Values 4.6355 1.4684 0.4792 0.2522 0.0744 0.0581 0.0322
Eigen Vectors
BAND 1 0.4372 0.4328 0.4446 -0.1705 0.2907 0.3557 0.4287
BAND 2 0.1000 -0.0901 0.0362 -0.7288 -0.5938 0.2680 -0.1582
BAND 3 0.3001 0.4385 0.2832 0.0501 -0.3113 -0.6949 -0.2349
BAND 4 0.2499 0.1860 -0.158 0.5527 -0.2965 0.5466 -0.4589
BAND 5 -0.6566 0.0929 0.6802 - 0.0228 0.0709 0.1138 -0.2811
BAND 6 -0.2344 -0.0195 0.1254 0.3328 -0.6108 0.0096 0.6671
BAND 7 -0.4003 0.7541 -0.4921 -0.1433 -0.0082 0.0834 0.0367

Table. 3 Concusion matrix by PC Histogram method
nown Category Type Number of
pixels
Percent
Correct
Number of Pixels classified
into category
1 2 3 4 5 6
1 1244 94.9 1181 30 8 15 9 1
2 8117 78.9 970 6350 244 10 543 0
3 1747 91.4 13 115 1597 1 21 0
4 2899 94.7 148 4 0 2746 0 1
5 964 83.9 17 85 52 1 809 0

Table. 4 Confusion matrix by Maximum likelihood method
known Category Type number of pixels Correct Percent Number of Pixels classified into category
1 2 3 4 5 6
1 1244 86.4 1075 1 55 0 0 113
2 8117 80.5 3 0 6532 0 102 1479
3 1747 97.5 0 0 1705 0 0 42
4 2899 77.0 28 0 0 2238 0 633
5 964 83.5 5 0 80 0 805 74

2- Forest, 2-Paddy, 3-Urban, 4-Water, 5-Golf course , 6-Unclassifed

Overall classification performance:82.5% (total correct pixel/total pixels) Average performance by class : 85.0% (average of category accuracies)