Reconstruction of long term
land cover changes by a maximum likelihood interpolation method using
genetic algorithm
Masahiko Nagai, Ryosuke
Shibasaki, Huang Shaobo Center for Spatial Information Science,
University of Tokyo Cw-503, Block C, 4-6- 1 Komaba Meguro- ku, Tokyo
153-8505, Japan TEL & FAX: +81- 3- 5452- 6417 E mail mailto:nagaim@iis.u%20-%20tokyo.ac.jp Japan
Abstract Even
though long term land cover change is very important in various fields
such as global environmental studies, only fragmentary data has been
available. The interpolation method is applied to reconstruct long term
land cover changes from fragmentary observational data and knowledge of
the changes. Genetic- Algorithm (GA) is used as interpola tion method.
This method is very advantageous when the density of observational data is
low because it can create most probable spatio-temporal distribution of
class variables under the fragmentary observational data and behavioral
models.
Introduction
- Introduction
It is very important to have an adequate
knowledge of long term land cover change for understanding what is
happening in the present and may happen in the future. Human activities
have modified the natural environment significantly, while it has
recently become clear that during the last centuries the intensity and
scale of these influences have increased very much. Although long term
land cover change is very important, only fragmentary data has been
available. The maximum likelihood interpolation method using genetic
algorithm is applied to reconstruct long term land cover.
- Introduction of genetic algorithm(GA)
Genetic algorithm
(GA) is the search algorithm that is based on the mechanisms of natural
selection and evolution of natural genetics. The approach combines
survival of the fittest among string structures. Genetic algorithm is
computational simple and powerful in their search without restrictive
assumptions about search spaces. In a simple genetic algorithm , five
basic aspects are considered ; the representation or coding of the
problem, the initialization of the population, the definition of the
evaluation function, the definition of genetic operators, and the
determination of parameters.
- Optimization scheme for nominal variable
interpolation
Most of natural properties change along a
continuous scale. Spatial continuity and temporal continuity give
rationale for interpolating fragmentary observational data. There are
many models now for knowledge and rules governing spatio-tem poral
patterns and behavior of geographic objects. They can provide more
robust and quantitative basis for interpolating observational data.
Reliability of result estimated from model simulation can be improved by
combining reliable observation data. It is reasonable to assume that
spatio- temporal events or the voxel- field of nominal variables should
maximize likelihood under give observational data and behavioral models.
Observational data and behavioral models can be integrated in the
process of maximizing the likelihood of spatio-temporal events. Genetic
algorithm is applied as a optimization scheme because searching for the
most likely spatio-temporal or voxel-field of nominal data is a typical
combinatorial optimization problem. Data
- Input Data
1) History database of the global environment
(HYDE) In this study, History Database of the global Environment (HYDE)
is used for input data, such as potential and actual land cover data and
fragmentary observational data. This datar, HYDE, has natural background
vegetation based on the BIOME model (Prentice et al, 1992). The biome
model of Prentice et al. (1992) is the first used to select which plant
types may potentially be present at a particular site. This rule-base
captures the effects of minimum tem perature tolerances and chilling
requirements on determining the distributions of different plant types.
2) Land cover data at the start year and the end year: In this
study, year 1700 of HYDE is used as a land cover data at the start year
and year 1990 of HYDE are used as a land cover data at the end year. In
land cover data at the start and end year, agriculture land, pasture,
human s ettlement, and intensive agriculture are considered as the grass
class. To generate the simulation data of land cover changes from the
start year to the end year, land cover data of HYDE at the start year
and the end year are modified in to five classes that is explained in
2.2.
3) Point based observational data: Point based
observational data were collected from the HYDE. Poi nts of human
activities such as agriculture, pasture, and human settlements are
picked up. These collected point based observational data are the year
of 1750, 1800, 1850, 1900, 1950, and 1970 of HYDE.
4)
Cultivation intensity Cultivation intensity data are overlaid to
represent the impact of agricultural activities. Cultivation intensity
data is used to modify the actual vegetation data. In this study, if the
pixel has value more than 50% in cultivation intensity data, the class
in the representing area in actual vegetation will be modified to
intensive agriculture.
5) Total area of agriculture area: Total
area of agriculture area is strongly related with the total population.
In the interpolation, a restricted condition should be taken into
account in which the total agricultural area should be proportional to
the total population. However, the computational work can be very heavy
if the restricted condition is clearly.taken into account. To avoid
this, the interval of the time-slice in the interpolation neighboring
time slices which results in an almost constant growth rate of
agriculture area expansion between the neighboring time slice.
Therefore, knowledge on the land cover changes can be very much
simplified.
6) Transitional probability: Knowledge on land cover
change is given in terms of transitional probability from one class to
another. Transitional probability changes according to regional
condition. In areas which are climatologically suitable for high
agriculture, the transitional probability from forest or grassland to
agriculture areas is relatively high. In areas where the possibility of
wind erosion is high, the transitional probability from grassland to
barren or desert area is relatively high. In areas with a very high
suitability of agriculture, ordinary agricultural areas are likely to
change to intensive agriculture area.
- Class category and time interval
In this study, class
category is divided in to five land cover classes, forest, grass, APH,
IA, barren-l and, and water. APH means Agriculture land / Pasture /
Human Settlement, and IA means Intensive Agriculture. This class
category is different from the categories used in the input data, “
History Database of the global Environment (HYDE)”. These data should be
reclassified to match the new five class category. In this study, result
of long term land cover changes is from 1900 to 1990. The time interval
of the result is 10 years. Processing
- Three dimensional representation of an individual
(coding)
In this study, a three dimensional array is defined to
represent an individual in a space and time domain. The horizontal plane
represents two dimensional spaces, and the vertical dimension represents
temporal dimension.
- Initialization of population
An initial population for a
gene tic algorithm is selected unsystematically. One random trial is
made to produce each individual. On the other hand, value of each member
of initial population is the same because all members of the initial
population are automatically selected by the same procedure.
- Definition and computation of an individual’s fitness
1)
Spatio-temporal behavioral models of class variable data In the GA based
interpolation, any types of behavioral models can be applied if they can
determine the probability of every possible behavior and transition of
nominal or class variables. For nominal variable data, possible changes
in a class at one pixel are basically defined by the probability of the
changes from one class to another. In this study, transitional
probability is determined by the combination of classes in the
neighborhood. Spatial and temporal relations affected the transitional
probability in three ways. The first is spatial continuity which is
based on the assumption that the same class data tend to continue in the
spatial demotion. The second is temporal continuity which is an
extension of the spatial continuity to the temporal domain. The third is
expansion contraction relations which is based on the assumption that
some data class have a higher possibility of expanding their area at the
next time slice while others tend to contract.
2) Definition and
computation of fitness of an individual Fitness of an individual is
defined by the combination on behavioral fitness and observational
fitness. Behavioral fitness is the combined probability of a change in
events of nominal variables under the condition that these changes
follow a given probabilistic behavioral model or rule. Observational
fitness is the combined probability that the observational nominal
values occur under probabilistic functions of observational error or
uncertainties. Observational probability can be determined by accuracy,
resolution and frequency of observation. Overall fitness can be computed
by multiplying behavioral fitness and observational fitness. Thus,
behavioral or structural models and observational data can be integrated
by optimizing the overall fitness.
- Definition of operators
1) Reproduction Reproduction is a
process in which individual strings are copied according to their
objective function values or the fitness values. Copying strings
according to their fitness values means that strings with a higher value
have a higher probability of contributing one or more offspring in the
next generation. 2) Crossover The crossover operator first randomly
mates newly reproduced individuals in the mating pool. It then randomly
locates a window of random size for a pair of individuals. Finally, the
contents of the individual within the window are swapped to create new
individuals. 3) Mutation Mutation is a genetic operator that alters one
or more gene values in a chromosome from its initial state. This can
result in entirely new gene values being added to the gene pool.
Mutation is an important part of the genetic rearch to prevent the
population from stagnating at any local optima.
- Improvement of the search
1) Hill-climbing method If the
complex space of problem resolutions becomes larger and larger, the
population size and the generation size have to be increased bigger and
bigger at same time. The efficiency of GA is one of the weak point to
real world application of the GA. Hill- climbing is a good method of a
search strategy that exploits the best among know possibilities for
finding a improved solution. In this study, the potential for combining
the Hill-climbing strategy with GA was investigated.
2)
Population diversity Premature convergence is caused by early emergence
of an individual that is better than the others in the population,
although far from optimal. To avoid premature convergence , one has to
avoid the loss of population diversity. Although reducing the
reproduction number cannot always eliminate premature convergence, it
can be used as a simple way to reduce rapid convergence. In this study,
the duplicated number of individuals was limited less than two. If the
individual’s expected duplicated number is larger that two, it was set
equal to two. Result Figure 1 shows the result of
reconstruction of l ong term , year 1900 to 1990 by 10 years interval,
land cover changes by a maximum likelihood i nterpolation method using
genetic algorithm. Because long-term changes in climatologic variables are
ignored, moreover the number and quality of the point based observational
data are limited, the reconstruction results cannot be validated against
the other observational data. Nevertheless, the reconstructed results show
a reasonable fitting both to the observational data and the knowledge of
the changes. It can be concluded that by applying more accurate and
reliable scientific data and knowledge of climate changes, long term land
use and land cover changes can be reconstructed more accurately.
Conclusion In this study, the interpolation method is
applied to reconstruct long term land cover changes from fragmentary
observational data and knowledge of the change. Genetic algorithm and hill
climbing can be successfully applied to the combinatorial optimization of
nominal voxel-field data. This maximum likelihood interpolation method
using genetic algorithm has reconstructed the long term land cover changes
by every 10 years interval from year 1900 to 1990. And this reconstructed
land cover changes shows a reasonable fitting both to the observational
data and the knowledge of the change.
References
- Ryosuke Shibasaki and Shaobo Huang (2000). Integration of
observational data and behavioral models for spatio temporal
interpolation- application to reconstructing long term land use and land
cover changes, Present and Future of Modeling Global Environmental
Change Toward Integrated modeling, T. Matsuno and H .Kida, ed. Terra
Scientific Publishing Company, Tokyo, 293- 309.
- Klein Goldewijk, K. (2001). Estimating global land use change over
the past 300 years: The HYDE database. Global Biogeochemical Cycles
15(2): 417-434.
- Prentice, I.C., Cramer, W., Harrison, S.P., Leemans, R., Monserud,
R.A., and Solomon, A.M. (1992). A global biome model based on plant
physiology and dominance, soil properties and climate. J. Biogeogr., 19:
117-134.
Figure 1. Reconstructed long
term global land use and cover change |