Land cover classification by
principal component histogram method
Katsunori Furuya
Graduate school of Science and Technology
M. Akasaka,
R. Tateishi Remote Sensing and image Research Center
H.
Ishii Faculty of Horticulture Chiba University 1-33 Yayoi-cho,
Chiba city, Chiba, 260 Japan
Abstract Mainly used land cover
classification method is the maximum likelihood method, which assumes
multi-dimensional normal distribution. But this assumption is not true
sometimes. In terms of this point, we propose a new classification method,
the PC (principal component) histogram method. The PC Histogram method
assumes three-dimensional distributions in principal components space,
which are defined by three-dimensional PC histogram of ground truth data.
Therefore PC histogram method is adaptable for various types of
distribution of ground truth data. By the evaluation of PC Histogram
method using TM data, the following results were obtained. (1) PC
Histogram method has better classification accuracy them maximum
likelihood method for some categories. (2) CPU time for classification
does not increase in proportion to the increase of the classified pixels.
Analyzed data and study area.
Landsat TM
data TM data is shown in Table 1.
Table.1 Landsat TM data
Landsat TM data |
Date |
: |
1987 July 24 |
Path-Row |
: |
107-35 |
Processing Level |
: |
Bulk Processed | Study
area Study area was located 40km northeast of Tokyo in Japan. Its
width extends about 18km from north to south and about 27km from east to
west. The study area contains city areas, agricultural areas, river and
lake. We used five categories, forest paddy, urban, water and golf course.
Methodology Figure 1 provides the flow processing, the
details of which is described below.
- Geometric registration
TM data was reiterated and resampled.
The number of reiterated TM data pixels in the study area is 900 by 600
amounting to 720,000.
- Ground truth data
Ground truth data were collected from an an
enhanced TM image and maps. The number of ground truth data for
classification is 1399 pixels. The number of check data for evaluation
of results is 14971 pixels includes ground truth data for classification
.
Figure. 1 Flow of processing
PC Histogram method
- Steps of PC Histogram method
PC Histogram method is
supervised classification method, which consists of following five
steps. The flow of PC Histogram method is shows figure 2.
Step1: Principal component analysis In order to reduce the
dimension, principal component analysis is applied, and 1st, 2nd and
3rd components are used for classification.
Step2:
Three-dimensional histogram The each distribution of ground truth data
in three-dimensional histogram is produced from their 1st, 2nd and 3rd
principal components. Sampling interval of three principal components
for production of histograms is decided by the variance of principal
components and frequency of ground truth data.
Step 3:
Interpolation of histogram frequency data into continuous three
dimensional grid data
Histograms produced in the Step 2 have
discontinuous frequency distribution. Frequency data in some intervals
may be zero though surrounding sampling intervals have non-zero
frequency data. Interpolation processing is applied in order to
produce continuous frequency distribution for the next Step.
Step4: Division of dimensional space into regions assigned to
categories In order to compare distributions of all categories,
frequencies of Step 3 is converted to normalized frequencies. The
normalization here is the processing by which the sum of frequencies
of each category becomes equal. By comparing normalized frequencies of
different categories at every sampling interval, the category which
has the highest normalized frequency is selected. By this processing
three-dimensional space is divided into regions assigned to
categories.
Step5: Classification of image data At every pixel
of TM data, the values of 1st, 2nd and 3rd principal component are
calculated. Classification is performed by this value and divided
dimensional space of step 4.
- Maximum likelihood method
For the verification of PC
histogram method, maximum likelihood method was performed with same TM
data and ground truth data. ResultsTable 2
shows the results of principal component analysis. The cumulative
contribution up to 3rd component is about 94%. The confusion matrix of
maximum likelihood method is shown in table 4. The classification
accuracies by PC Histogram are almost similar to that by maximum
likelihood method is 2 times so much as PC Histogram method.
ConclusionPC Histogram method is adaptable for various
types of distribution of ground truth data. By the evaluation of PC
Histogram method using TM data, the following results were obtained. (1)
PC Histogram method has better classification accuracy than maximum
likelihood method for some categories such as forest, water which doesn't
have normal distribution in the study area. (2) CPU time for
classification by PC Histogram method does not increase in proportion to
the increase of the classified pixels. Reference
- uruya, M. Akasaka and R. Tateishi: "Land cover classification by
Principal Component Histogram method", proc. Conference of JSPRS, Tokyo,
pp 103-106, (1989)
Table. 2 Results of principal component analysis
|
1st PC |
2nd PC |
3rd PC |
4TH PC |
5TH PC |
6TH PC |
7TH PC |
Contributions |
0.6621 |
0.208 |
0.0684 |
0.0360 |
0.0106 |
0.0083 |
0.0046 |
Cumulative |
Contributions |
0.6621 |
0.8719 |
0.9404 |
0.9764 |
0.9870 |
0.9953 |
1.0000 |
Eigen Values |
4.6355 |
1.4684 |
0.4792 |
0.2522 |
0.0744 |
0.0581 |
0.0322 |
Eigen Vectors |
BAND 1 |
0.4372 |
0.4328 |
0.4446 |
-0.1705 |
0.2907 |
0.3557 |
0.4287 |
BAND 2 |
0.1000 |
-0.0901 |
0.0362 |
-0.7288 |
-0.5938 |
0.2680 |
-0.1582 |
BAND 3 |
0.3001 |
0.4385 |
0.2832 |
0.0501 |
-0.3113 |
-0.6949 |
-0.2349 |
BAND 4 |
0.2499 |
0.1860 |
-0.158 |
0.5527 |
-0.2965 |
0.5466 |
-0.4589 |
BAND 5 |
-0.6566 |
0.0929 |
0.6802 |
- 0.0228 |
0.0709 |
0.1138 |
-0.2811 |
BAND 6 |
-0.2344 |
-0.0195 |
0.1254 |
0.3328 |
-0.6108 |
0.0096 |
0.6671 |
BAND 7 |
-0.4003 |
0.7541 |
-0.4921 |
-0.1433 |
-0.0082 |
0.0834 |
0.0367 | Table. 3 Concusion matrix
by PC Histogram method
nown Category Type |
Number of pixels
|
Percent Correct |
Number of Pixels classified into category |
1 |
2 |
3 |
4 |
5 |
6 |
1 |
1244 |
94.9 |
1181 |
30 |
8 |
15 |
9 |
1 |
2 |
8117 |
78.9 |
970 |
6350 |
244 |
10 |
543 |
0 |
3 |
1747 |
91.4 |
13 |
115 |
1597 |
1 |
21 |
0 |
4 |
2899 |
94.7 |
148 |
4 |
0 |
2746 |
0 |
1 |
5 |
964 |
83.9 |
17 |
85 |
52 |
1 |
809 |
0 | Table. 4 Confusion matrix by
Maximum likelihood method
known Category Type |
number of pixels |
Correct Percent |
Number of Pixels classified into category |
1 |
2 |
3 |
4 |
5 |
6 |
1 |
1244 |
86.4 |
1075 |
1 |
55 |
0 |
0 |
113 |
2 |
8117 |
80.5 |
3 |
0 |
6532 |
0 |
102 |
1479 |
3 |
1747 |
97.5 |
0 |
0 |
1705 |
0 |
0 |
42 |
4 |
2899 |
77.0 |
28 |
0 |
0 |
2238 |
0 |
633 |
5 |
964 |
83.5 |
5 |
0 |
80 |
0 |
805 |
74 | 2- Forest, 2-Paddy, 3-Urban,
4-Water, 5-Golf course , 6-Unclassifed Overall classification
performance:82.5% (total correct pixel/total pixels) Average performance
by class : 85.0% (average of category accuracies)
|