Talking GIS : A Theoretical
Basis 1. Background People often use qualitative spatial thinking and reasoning in everyday life (Kuipers 1978; Lynch 1960); however, current commercial GISs support primarily quantitative spatial queries and answers, and in general lack direct support for users to make qualitative queries about spatial relations. For example, point to point distance in metric units or direction in degrees does not necessarily conform to people's usage of terms and concepts in natural-language. People's spatial concepts are often more qualitative in nature, including such concepts in natural-language. People's spatial concepts are often more qualitative in nature, including such concepts as north, near and far (Hong 1994; Frank 1992). A qualitative GIS model that is capable of answering queries of the form, "Show me the road that goes around the Acadia National Park" will complement the existing models. The incorporation of natural-language spatial relations such as along, goes by, and through will give GIS users a greater choice in formulating queries, depending on the task being performed. By better accommodating the human requirements, qualitative models will also contribute toward the greater utilization of GIS technology. Among the potential applications of such a model are road navigation system, tourist databases, and evidence gathered from a security or judicial perspective. In creating natural-language, spatial-query models, a cross-linguistic study will be of value as it can be used to gauge the versatility of the mathematical formalism utilized, as well as to celibrate the parameters of the mathematical formalism for different cultural and linguistic groups. Although GIS command languages are mostly written in English with the cognitive modals of native English speakers, the GIS users' community is mostly composed of groups of non-native English speakers (Campari 1994). As GIS database queries are often an expression of spatial relationships, it seems that "use errors will be reduced if the names chosen for such spatial predicated match the common sense meanings of those terms" (Mark and Egenhofer 1994). As such, calibrating the spatial predicates of a particular natural language against a formal mathematical model of spatial relations would allow us to hand a tool to GIS develops and enable them to design GIS databases that will be capable of answering spatial queries unambiguously in the particular language concerned. 2. Problem Statement Formal models currently used in commercial GISs lack the parameters to accommodates the flexibility that natural language has in partitioning space. Thus, it is necessary to identify what parameters play a significant role in the selection of spatial predicates when people describe spatial relations, and what parameters do not influence the meaning of a particular term. Knowledge about these parameters will allow us to develop formal models as well as allow for their calibration to better fit human intuition. 3. Motivation The primary motivation for this work stems from the fact that there is a need for a mathematical formation that is flexible enough to accommodate natural-language spatial queries that people may ask from a GIS. The cross-linguistic dimension of this problem is particularly challenging, as such a mathematical formalism could be used in the transfer of data sets across cultures, yet allowing for he correct determination of the semantics of the spatial relations. Within the framework of a Global Information Infrastructure (GII), the results from this research can be used in building spatial data transfer standards at a local level. For example, a Spatial Data Transfer Standard for Malaysia requires an efficient link not only among databases within Malaysia, but also to the rest of the GII as it will allow users.
Geographic database typically comprise digital, geometric representations of real-world entities. These entities may be represented as objects that consist of points, lines, and areas, whose relationships are described geometrically, for example by using concepts such as a point being the intersection of two lines or two points being located on opposite sides of a line. Spatial queries in a GIS, however, contain symbolic representations of spatial relations rather than detailed descriptions of the geometry. Examples of such symbolic representations are term like through or along. People more frequently describe a "road that goes along the park" rather than a "road that coincides with the park's boundary". For creating formalisms of natural-language spatial relations there is a need to map the symbolic representations of real-world spatial relations onto a valid geometric configuration. Such a mapping would likely have a one-to-many relation between the symbols and geometry, because natural language has a limited set of words to match the infinite number of geometric configuration that can be constructed. The goal of this research is to develop a former model that captures the essence of geometric configurations. The essential geometry that is captured should satisfy a particular natural-language query about a spatial relation and also have the capability of describing a particular geometric configuration within a database using the best natural-language description. In the first case, all geometric configurations within a database that fit the query description should qualify as the answers to the query. In the latter case, a user may be interested in finding the best natural-language description for a particular geometric configuration in a database, and the system should be capable of responding with a prototypical description of the particular geometric configuration. Several other models for natural-language spatial relations have been defined based on linguistic, geometrical, and connectionist approaches; however, none addresses sufficiently the linkage to data models used in geographic information systems. The models used by linguists are primarily based upon introspection (Herskiovits 1986; Talmy 1983). Although they provide good research directions, that lack of mathematical objectivity in them precludes their use in information systems. Current GISs are based directly on these continuous measurers, then there would be an infinite number of terms, as continuous measures are subject to infinite variability. The direct usage of Cartesian coordinates as a basis for natural-language spatial descriptors is thus inappropriate. However, by discretizing the continuous measures, for example by determining the range of values for particular spatial terms, a mapping between the discretized values and spatial terms, and vice-versa, can be made. Such an approach will be beneficial to a GIS, because it can be easily implemented using current query tools such as SQL. The connectionist model, which in principle attempts parallel computing using the human brain as a model, is another method that has been used to represent natural-language spatial relations (Regier 1995). This approach involves the training of the model using a very comprehensive set of geometrical configurations in order to achieve the desired level of results. Other formal models that have been proposed include the 9-intersection model (Egenhofer and Herring 1991), which is a purely topological model; models for cardinal directions (Frank 1992; Peuquet and Ci-Xiang 1987); and models for approximate distances (Hernandez et al. 1995, Hong et al. 1995). Although the exact geometry in a set of configurations showing a road and a park, symbolized by a dark line and a polygon, respectively, may not be identical, people may use the same term to describe several distinct configurations (Figure 1). For example, the three configurations in Figure 1 have both end points of the line outside the polygon and at least some part of the line's interior intersects with the interior of the polygon. In essence, they have the same topology, and the same term through is used to describe all three configurations. Figure 1 : Similar term through On the other hand, different terms may be used to describe configurations even though the topology of these configurations is the same. For example, Figure 2a could be described by the term goes along as in, "The road goes along the park," while Figure 2b may be better referred to by, "The road is outside the park." Hence, topology cannot be the only criterion for determining the appropriate spatial terms. Metric distinctions may also be necessary. Figure 2. Different terms (a) along and (b) outside The hypothesis for this research is "For the same topology, metric influences the choice of natural-language spatial terms." The hypothesis is proved by:
5. Scope of the Study The GIS literature frequently cites three types of spatial relations as the most fundamental ones (Pullar and Egenhofer 1988; Wornboys 1992): Topological relations are those spatial relations that are invariant under continuous transformations with continuous inverses.
In order to allow for qualitative queries, the semantics of natural-language spatial elations must be defined and represented within a GIS. A variety of properties may contribute to the choice of a particular spatial term (Mark et al. 1995), such as the natural language (English vs. Spanish), the culture, the semantics of the spatial objects involved, the tasks users envision, the context in which the objects are presented, the pictorial presentation ( a sketch vs. a topographic map), and the objects' geometrics. We will concentrate here on the geometrical aspects. The approach taken here is a refinement of the 9-intersection model (Egenhofer and Herring 1991) to accommodate more semantics of natural-language spatial relations. The 9-intersection model itself has proved to be highly successful in the human subject testing conducted in the first round (Mark and Egenhofer 1992; mark and Egenhofer 1994a; Mark and Egenhofer 1994b; Mark and Egenhofer 1995). The refined model in this research will thus provide a framework for further testing in the second iteration of he human subject testing procedures. Combinations of topological relations involve points, lines, and regions. The research on topological relations and spatial predicates in this research will be confined to line-region relationships only. For the purpose of carrying out the calibration of symbols, we focused on the road and park relationship (Mark and Egenhofer 1995). 6. Topics Excluded from the Present Investigation Related issues that have been excluded from this investigations are:
The major finding of this research is that it is possible to create a formal model for representing natural-language spatial relations. The model identifies metric parameters that are a refinement of topology. The results revealed that these proposed parameters provide the discriminatory function in the grouping of the natural-language terms. For terms with the same topology, it was proven that metric influences the choice of natural-language spatial terms. It was also found that when a parameter is significant for a spatial term in the prototypical test then for the agreement task (Mark 1996) there is a corresponding significant increase in agreement for that particular spatial term in describing a configuration in which this parameter is present. This observation was found to be true for the English data sets that were tested. As these findings are based on the analysis of two separate and distinct data sets, they thus prove that the results from the prototypical tests are not data dependent. The formal model has the flexibility of having its parameters calibrated for particular spatial terms as well as domain of users. Based on the data available, the minimum and maximum extent of the parameter values for each group as well as for each natural-language spatial term used in this investigation was calibrated. These results were complied to create a Metric Table of Spatial Terms. This table is significant as it allows for the discrimination between spatial terms based on the topologic and metric parameters. Bibliography
|