Development of a NOAA image
database with feature-based retrieval functions
Changming Zhou and Mikio Takagi Institute of
Industrial Science University of Tokyo, Tokyo, Japan
Abstract In this paper, NOAA-AVHRR image
database system which is being developed in our laboratory is presented.
Some new image retrieval approaches which are based on the image features
are included in this system. After one scene of NOAA image is received, an
automatic classification method proposed in this paper is applied to a
2048 pixels x 1960 liner region, which usually arouses users' interest,
around Japan. The classified image, them is processed with region-labeling
and boundary-following method. Each labeled region is represented using
chain codes and stored in the system. The spatial relations between
regions in the same scene are described by a syntactic pattern recognition
approach, i.e., an edge-labeled directed node-label controlled graph (ed
NLC-graph) based on the center-of-mass of each region in addition to a
men-driven user interface, the system provides its users with two kinds of
guide images, and users may use a mouse tool to specify the objective
region and conditions of image contents (e.g., with or without clouds, the
shapes of clouds etc.) on a display instrument, and retrieve images in the
following to steps; (1) global retrieval based on the spatial relation
between regions represented by ed NLC-graph, and (2) similarity retrieval
based on the geometric properties of the dominant regions represented by
chain codes. In addition, in order to improve the retrieval speed, and
since information about cloud-covered regions is the main retrieval clue
usually given by users, bit-based operations functions of a
general-purpose image. Processor are applied to an index image generated
from original images to do pre-selection of the candidate images before
the above two-step retrieval processing.
Introduction In
most of the remotely sensed image database systems developed up to now,
images are recalled mainly by the attribute information attached to the
archived images such as image identities, sensor name etc. Nowadays,
however because remotely sensed images are integrated into geographic
information systems (GIS) together with various maps and other data, more
effective retrieval approaches, for instance retrieval methods based on
the image contents, are required. Development of such kind of retrieval
methods becomes the key subject of image database and GIS.
In the
case of NOAA images, considerable receiving, archiving and procession
systems have been developed in many ground stations (1) (3) There exit a
few systems that distribute a kind of abstract images called quick-look
and provide users with visual inspection after images are received [1][4].
In [1] raw images are classified into a few classes such as land, sea etc.
denoted as hatched patterns with a simple classification method based on
some thresholds obtained empirically, and in [4], 10-bit original images
are reduced into 8-bit images with size of 512 x 480 pixels, moreover, the
reduced images are transformed into dither images with 64 levels using a
general-purpose image processor and delivered to other universities and
research organizations immediately after images are received via
facsimile. In all he systems mentioned above, images can be recalled by
using actuations time and some geographic parameters (longitude, latitude
etc.), however retrieval approaches based on image features are not
offered.
NOAA-AVHRR (Advanced Very High Resolution Radiometer)
images are widely using in many fields. About 4-- 8 scenes can be received
one day from two NOAA satellites (at present, NOAA-10 and NOAA-11 are
available). Resembling to other remotely sensed data, NOAA-AVHRR images
possess the characteristics of wide coverage, frequent observation, vast
quantities, and are utilized for monitoring and observing the environment
of the Earth. Consequently, users usually access those images that are
received under some special conditions and possess some features. This
kind of access can not be realized only by those attribute data of images
managed by conventional database management system, e.g. the relational
database management system.
The system presented in this paper is
developed mainly to provide users with not only conventional retrieval
approaches but also those based on image features. Because processing such
as geometric distortion correction and sensor calibration of NOAA-AVHRR
image is very time-consuming and space-occupying, only raw NOAA data are
traded in this system. Structural and syntactic pattern recognition
methods, iconic indexing and similarity retrieval approaches are
introduced into this system for feature extraction, description,
representation and image feature retrieval of NOAA-AVHRR images.
Feature Extraction of NOAA-AVHRR Images In order to
extract features of NOAA-AVHRR images, first of all we must classify the
raw images. Classification algorithm to be applied in this system must be
fast, and absolute accuracy is less crucial because, essentially, the
classified images are only required to stand up to global feature
description. An automatic classification method has been proposed in
[1][2], but is mainly utilizes the absolute brightness temperature
thresholds obtained empirically, and can not be applied to this system,
because we use uncalibrated data and the temperature thresholds is
inapplicable to our case due to the regional difference. Because of the
huge quantities of NOAA-AVHRR images, the classification techniques
proposed so far which is fundamentally based on the pixel-by-pixel
processing are not adaptive in this case. We, therefore, select the
histogram-based approaches to classify NOAA-AVHRR images. Many approaches
based on histograms are proposed for the image binarization, even if most
of them can be expanded to multivalue quantization, the class number must
be specified previously. For NOAA-AVHRR images, however, class number is
not invariant because the variation of images in different weather
conditions. Consequently, a peak detection technique is selected to
determine class number and thresholds.
- Peak Detection Method Based on the Histograms
The peak
detection technique proposed in [5][6] uses the image cumulative
distribution function (cdf) to locate the peaks of the histogram. The
peaks are located using the zero-crossing and local extrema of a peak
detection signal generated from the cdf. For an image represented by M
gray levels, the cdf c (n) can be derived from the gray-level histogram,
and from c (n) a new function cN (n) can be obtained by [1].
Eq.(1) Where, Ä means convolution operation and a uniform
rectangular window wn is defined as in (2).
wN(m) = 1/N, - (N-1)/2 < m < (N -
1)/2 .......................(2) then a peak detection
signal rN (n) can be defined as in (3).
Eq.(3) The following principles
are applied to the detection signal rN to estimate the start, maximum
and end points of the peaks: (1) A zero-crossing of the detection signal
to negative values indicates the start of a peak, and denoted by
si, for the ith one. A zero-crossing of the detection signal
to positive values following a negative crossover estimates the gray
level at which the peak attains its maximum, and this gray level is
denoted by m1. Similarly, Si + 1 and mi +1 can be
obtained (2) The gray level between two successive negative crossovers
at which the detection signal attains its local maximum is defined to be
the end points of the peak For the it peak, this peak, this gray level
is denoted by ei. One peak will be represented by such three
parameters (si, mi, ei) later in this
paper.
Obviously, the sensibility of the above peak detection
signal depends on the parameters N in (2), which is referred to as
peak-detection parameter. When this technique is applied to real image
histograms, e.g. NOAA-AVHRR image histograms, it is very difficult to
determine the value of N because of the variation of detected peaks b
using different N. We, therefore, propose an adjustment method to derive
the optimal number of peaks no matter what the parameter N is specified.
The adjustment method used by us utilizes the square of Fisher distance
(FD2) shown in (4) and Maharanobis generalized distance (MD)
to check two successive peaks under the hypothesis of Gaussian-like
distribution (i.e. bimodality check).
FD2 = n(m1 -
m2)2 / (n1
s21 + n2 s22
.......................(4) Where, n1,
n2 are the sample numbers of two distributions, respectively.
Correspondently, m1, m2 and s1.,s2
are the means and standard deviations.
As pointed out in [7],
FD2 attains the maximum at which the point is forest from the
center of a Gaussian distribution. We calculate FD2 and MD for two
successive peaks denoted by (si, mi,
ei) and (si+1, mi+1, ei+1),
respectively. If FD2 attains the maximum outside (mi,
mi+1) then we combine the two peaks into a cluster denoted by
(si, max (mi, mi+1), ei+1)
and repeat this process until no peaks can be combined. MD is used to
calculated the percentages of the two peaks as another adjustment
criterion to avoid the noise and exclude much small peaks.
- Classification of NOAA-AVHRR Images
Fig.1 The flowchart of
classification The flowchart of the classification
processing is shown in Fig. 1 Firstly, gray-level histograms are
generated after NOAA-AVHRR images are received. Secondly, in order to
obtain optimal class number on the base of histograms, we apply the
above mentioned peak detection method and the adjustment method to
NOAA-AVHRR image histograms. Then thresholds are determined according to
the parameters (si,mi,ei) of detected peaks by equation (5).
ti = int[mei + (1
- m) si + 1 ]
........................(5) Where ti, is
the threshold, 0 £ m
£ 1, and in [.] denotes the nearest-integer
truncation operation. In this paper, we use m =
0.2 after studying the gray-level histograms of many NOAA-AVHRR images
received in different seasons. Finally the raw images are classified
with the thresholds using the rule base, which is formed by studying
many examples, and categories determined in the following way.
Categories such as land, cloud, sea, sunglint used in [1] [2]
are applicable to this system. After checking many image histograms, we
find that two peaks corresponding to clouds usually appear in the
histogram. Correspondently, we classify clouds into thick and thin ones,
if possible. A few rules are applied to the finally obtained peaks to
determine the corresponding relations between peaks and categories.
Moreover, we utilize coastline pictures with the same geometric
distortion as the received images, which is overlaied to quick-look
image in [4], to locate land area even covered with clouds. Therefore,
we can represent those regions like land areas occluded by thick or thin
clouds with the description and representation methods described later
in this paper. The classified results are iconic images with the size of
128 x120 pixels. Classification is done in two ways according to the
acquisition time of the objective NOAA-AVHRR image. Image are ground
into day and night-time ones based on the data of visible and
near-infrared channels are informative or not.
In the case of
day-time images, data of channel 1, 2, and 4 are used for
classification. Peak detection method described above is applied to the
data of channel 4, and in order to locate cloud-free continent area
where are difficult to extract only by gray-level histograms, normalized
difference vegetation index determined by (6)
VI = (ch.2 - ch.1) / (ch.2 + ch.1)
...............(6) In calculated to distinguish
cloud-free land from others (sea, cloud) with a threshold of 0.4 as in
[2]. In the case of night-time images, only the data from channel based
on the detected peaks in applicable to night-time images. Fig.2 shows an
original NOAA-AVHRR image (ch.4) received at 14:00 of Auguist 5 1990,
and its histogram is whoen in Fig 3. The peak detection signal function
expressed by (3) in shown in Fis.4 in this case, the peak detection
parameter N equals 77, the number of peaks is 8, and it is reduces to 3
after adjustment. These three peaks are (177, 389, 406), (615,702,766)
and (767 and 767, by 1024.. The classified iconic image of Fig .2 which
is obtained using the method described above, is illustrated in Fig.
5
Fig.2: An original NOAA-AVHRR image(ch.4)
Fig.3: The histogram of fig.2
Fig.4: The peak detectionsignal for fig.3
Fig.5: The classified iconic image of
Fig.2Description and representation methods for image
featuresIn order to realize the desired retrieval approaches based
on image features, it is necessary to describe and represent image
features effectively. Some researches about content-based image retrieval
approaches such as [8][9] have been executed so far. In this paper, since
the shapes of regions and the spatial relations between re-gions are
important clues for recalling images on the basis of features, two kinds
of description methods (chain codes and edNLC-graph) corresponding to
spatial relations and shapes, respectively, are applied to representing
NOAA-AVHRR image features.
- Description with Chain-Codes
Although many descriptors
(e.g. chain-code, Fourier descriptor; Walsh descriptor and etc.) for the
shapes of patterns are proposed up to now, chain-code is one of the most
effective descriptions for shapes. As pointed out in [11], the
chain-code, in a data be environment, is a better representations such
as Fourier or Walsh descriptors.
The labeling and boundary
following approaches for binary images are applied to the iconic images
obtained form the original images, and the results are utilized to
describe and represent the image features. Firstly, we do propagation
labeling on the basis of marks assigned for different classes using
eight-direction codes. Secondly, a recursive method for boundary
following is applied to the labeled images. Finally, regions resulted
from the above labeling and boundary following processing are
represented by their boundary chain codes, the number of pixels within
each region and the corresponding category of each region are also
derived. In addition, the adjacency matrix of all regions described by
(7) is generated for later graph description.
A=[aij], (i, j, = 1,2, …..,m)
..........................(7) Where m denote the
number of regions, and aij equals the number of pixels which
are located on the common boundary between the two regions. If two
regions are not neighboring , aij is assigned to
zero.
- Description with edNLC-graph
Spatial relations between
regions are represented by ed NLC-graph (edge-labeled directed
node-label controlled graph). Each region is considered as a node of
edNLC graph, and the position of each node is represented by the
coordinates of its center-of-mass which are obtained by averaging the z
and y coordinates of all boundary points.
An extension form of
edNLC-graph can be defined by (8) as in (10)
G=(V,E,S,G,Y )
..................................(8) Where: V is a
finite, non-empty set of nodes, S is a finite,
non-empty set of node labels, G is a finite,
non-empty set of edge labels, E is a set of edges of the form (n,l,w), where n,w,Î, V,lÎG, Y: V®S is a node labeling
function.
A set G may be considered as
a family of non-symmetric binary relations. It means that there exist
and edge label l-1 for each edge
label l such that edges ( n,l,w) and (w,l-1,n) describe
the same spatial relation between regions represented by nodes n and w.
Now , we
introduce a relation of simple ordering £, on
the set of edge labels G = ¡1,.....,¡n | ¡1 £ ...... £ ¡n as in [10],
so as to constructed an unambiguous string representation of a graph. We
use a set of edge labels describing spatial relations in a two
dimensional space which is illustrated by Fig. 6 and ordered:.
P
£ r £ s £ t £ u £ v £ x £ y
To represent NOAA-AVHRR image according to
the coordinates of center-of-mass of each region.
In this paper,
a characteristic description of a node nk is defined as a
sevenfold set: nk, c,p,(i1....ir),
o1....0q), (ir1....irr),
(or1....orq), where c is the category , and p is
the pixel number of this node. r and q are the numbers of edges coming
into and going out from this node, respectively.
(i1....ir), (ir1....irr) and
(o1....oq), (or1....orq) are
the indices and relations strings of the nodes coming into and going out
from this node, respectively.
Fig.6: The ordered set of edge
labels
Fig.7: The graph representation of
fig.5
An algorithm of transformation of edNLC-graph into
a form of characteristic description is briefly described as
follows.
(1)Let an image consisting of n regions k1,
....Kn, described by coordinates
(x1....y1), ...., (xn, yn)
be represented by a edNLC-graph G. A node va E G corresponding with a
region ka is called a S-node, if:
Eq.(9) (2)We start from the
S-node of n0 and we index it with 1.
(3). We index all the nodes all the nodes which are adjacent to n0 with the help of a relation £ in a set of labels of edges connecting the node
n0 with adjacent nodes by referring
the above adjacent matrix according to an increasing order:i=2,...,k.
(4). Next, we successively choose nodes which are indexed i=2,...,k and
we index all the nodes which are adjacent to them and which have not
been indexed up to this moment , and repeat this step for all the
nodes.
The image shown in FIg.5 can be represented by an
edNLC-graph shown in Fig.7, and the characteristic descriptions of the
nodes with over 20 pixels in he classified images illustrated in
Table.1.
- Indexing of Cloud-Covered Regions
In order to speed up
retrieval processing and since information about cloud-covered regions
is one of the main clues usually given by users, bit-based operation
functions of a general-purpose image processor are applied to an index
image generated according to the position or regions covered with clouds
to do pre-selection of the candidate images. Each image is divided into
16 (4x4) blocks with size of 512x480 pixels. A 2-channel) of the index
image is used to describe the cloud information of one raw indexed
image, and each bit of a pixel is corresponding to one of the 16 block.
If one block is fully covered with cloud, the value of the corresponding
bit is set to 1, otherwise.
Table.1: The characteristic discription of Gif.7
Node no. |
Class |
# of pixels |
In_node |
Out_node |
In_relation |
Out_relation |
1 |
2 |
38 |
|
2,3 |
|
rs |
2 |
0 |
3332 |
1 |
3,4,5,6,7,8 |
v |
urttuu |
3 |
4 |
3924 |
1,2 |
6,7,9,10,11,12 |
xr |
rvssss |
4 |
2 |
1741 |
2 |
5,11,13,14,15,16 |
v |
vtpstt |
5 |
1 |
212 |
2,4 |
6,8,11 |
yr |
txs |
6 |
2 |
256 |
2,3,5 |
11 |
yvy |
S |
7 |
2 |
25 |
2,3 |
|
pr |
|
8 |
2 |
23 |
2,5 |
|
ps |
|
9 |
2 |
87 |
3 |
11 |
x |
p |
10 |
0 |
31 |
3 |
12 |
x |
p |
11 |
0 |
2566 |
3,4,5, 6,9 |
12,14,15,16,17, 18,19,20 |
xyx xu |
sypyp rru |
12 |
2 |
50 |
3,10,11 |
|
xux |
|
13 |
0 |
30 |
4 |
|
u |
|
14 |
1 |
1906 |
4,11 |
17,21 |
xt |
st |
15 |
1 |
24 |
4,11 |
|
yu |
|
16 |
4 |
64 |
4,11 |
|
yt |
|
17 |
2 |
58 |
11,14 |
|
ux |
|
18 |
2 |
26 |
11 |
|
v |
|
19 |
4 |
41 |
11 |
|
v |
|
20 |
2 |
84 |
11 |
|
p |
|
21 |
2 |
42 |
14 |
|
y |
| Note: class 0,1,2,4 are meant sea,
thick clouds, thin clouds and land, respectively. Image
Retrieval
- User Interface
Besides the popular menu-driven user
interface, a kind of pictorial user interface is also offered in this
system, namely users can interactively retrieve images in the form of
query-by-pictorial-example. The system provides users with two kinds of
guide images, which are composed of a global map, as shown in FIg.8, and
a local one. The global one indicates the entire coverage where data
from the NOAA satellites can be received at our ground station, and the
local one covers the regions around the island of Japan, and is included
in the global map, although it is not shown here separetely. Users. may
use one of the guide maps or two of them, if necessary, to compose a
sketch image by using the mouse tool of a display instrument. In
addition, pictorial examples from the image database or the received
image can also be utilized for retrieving the similar images.
Fig.8 The global guide
map
- Pre-Selection of Images Based on the Index Image
When
users want to retrieved images based on the information about
cloud-covered regions, the system puts the index image into the image
memory of a general purpose image processor, and utilize its bit-based
operations to process the index image to select candidate images
according to the global cloud-covering information. This kind of
processing is very fast, as in this system, the capacity of image memory
is 512x512x4 bytes, global cloud information of 512x512 images can be
processed simultaneously. In addition, the maximum elevation and azimuth
of NOAA satellites at the acquisition time are also used for estimating
which image the desired region is included in and which block in the
image is near the desired region approximately.
- Retrieval Model
A new model for image retrieval proposed
in this paper is applied to our system. This retrieval model is a
hierarchical one, and is composed two parts, which are based on the
structural and syntactic methods widely used in field of pattern
recognition. The first one is based on the edNLC-graph representation of
images, and is utilized to retrieve images from the viewpoint of spatial
relations between regions. The second one is based on the chain-codes
description of regions, and is utilized to recall those images in which
the dominant regions or desired regions are similar in geometric
properties.
- Retrieval based on the spatial relations
As the spatial
relations between regions within one scene are represented by
edNLC-graph, graph grammars and parsing algorithms must be utilized to
analysis the similarity of such graphs for image retrieval. The
similarity is a kind of distance measure between graphs. We use the
string description, as shown in tab. 1, of each graph to calculate the
distances between graphs, and the distances can be used to evaluate the
candidates and determine the image (s) which satisfy the request from
users by doing inexact matching.
- Retrieval Based on Pattern Matching
The shapes of some
dominant patterns within an image are also an important retrieval clue
for image databases. In the case of remotely sensed images, much
information can be derived from the shapes of clouds, for instance,
precipitation can be estimated from the shape of clouds we can analysis
the weather and atmospheric conditions, because they are mainly
dominated by clouds.
Instead of chain-codes, we use a curvature
chain which the elements ei are defined from the chain code
elements di by (10) as in In [12].
ci = [di-di-1 +11)mod(8) -3
.............................(10) Comparing with
chain-codes, this curvature chain is rotation invent and is independent
on start and end points. A similarity evaluation methods based on the
geometric properties of the corresponding regions is applied to the
system. This methods consists of finding the target common subsequence
ILCS) of the above curvature chain strings and evaluating the similarity
of regions based on the LCSs. This method is only applied to the
dominant regions or desired regions, because it is computationally
expensive. The method used in this system similar to the used in [11],
in which the longest common sequence method of string matching is
applied chain-codes representation of images. Concluding
RemarksIn this paper we have described the configuration of a NOAA
image database system being developed at our laboratory, and a new image
retrieval model in which structural and syntactic methods of pattern
recognition are integrated. Because of the implementation of this new
image retrieval model, image stored in the system can be recalled not only
by attribute information of images in alpha-numerical form but also by
sketch images based on image features. In this sense, the development of
this system is very significant for the construction of multi-media
databases, which attract many database researches' attention at present,
as well as for its practical use in the remote sensing field. The
system configuration and the proposed approaches of classification,
feature extraction, feature description, the future, we will investigate
the structural and syntactic similarity between images using the proposed
image retrieval model from the viewpoints of NOAA-AVHRR image features and
human perception. References
- L.K. Fusco, et al., Earthnet's coordination shceme for AVHRR data,
INT. J. REMOTE SENSING 1989, VOL, 10,pp625-636.
- K. Muirhead, O. Malkawi, Automatic classification of AVHRR images,
Proceedings of the 4th AVHRR data users' meeting , pp31-34,.
- H. Murota, et al., Receiving and Processing System for Meteological
Satellite ( NOAA), Proceedings of the 8th Asian Conference on Remote
Sensing 1987.
- M. Nakayams et. al., Quicklook Images Distribution System for the
Meteorological Satellite (NOAA), IE87-89 (in Japanese).
- M. I. Sezan, A Peak Detection Algorithm and Its Application to
Histogram-Based Image Data Reduction. Computer Vision Graphics and Image
Processing, 36-51 (1990)
- M.I. Sezan, et. al., Automatic AutomicallyAutomically selective
Image Enhancement in Digital Chesi RadioGraphy, IEEE TRANSACTION ON
MEDICAL IMAGEING. VOL. 8, NO. 2, JUNE 1989 pp. 154-162.
- T.Y. Philips, et. al., ) (log n) BIMODALITY ANALYSIS, Pattern
Recognition, 741-746,1989.
- A. Yamamoto, M. Takagi, Extraction of Object Features and Its
Application to Image Retrieval, The Ttransaction of IEICE Vol. E 72, No.
6, 1989. pp 771-781.
- S.K. Chang, et. al. Iconic Indering by 2-D Strings , IEEE
TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. PAMI-9,
NO.3, MAY 1987, pp 413,-629, 1988.
- M. Flasinski, Parsing of edNLC-graph grammars for scene analysis ,
Pattern Recognition, Vol. 21, No.6, pp 623-629, 1988.
- W.I. Grosky, Y. Lu, Iconic Indexing Using Generalized Pattern
Matching Techniques, Computer Vision, Graphics, and Image Processing 35,
383-403 F(1986).
- M.J. Eccles, et al., Analysis of the digitized boundaries of planer
objects, Pattern Recognition, Vol. 9, pp 31-41, 1977.
|