Talking GIS : A Theoretical Basis

Talking GIS : A Theoretical Basis

Abdul Rashid Bin Mohamed Shariff Ph. D.
E-mail :rashid@itc.utm.my

1. Background
People often use qualitative spatial thinking and reasoning in everyday life (Kuipers 1978; Lynch 1960); however, current commercial GISs support primarily quantitative spatial queries and answers, and in general lack direct support for users to make qualitative queries about spatial relations. For example, point to point distance in metric units or direction in degrees does not necessarily conform to people's usage of terms and concepts in natural-language. People's spatial concepts are often more qualitative in nature, including such concepts in natural-language. People's spatial concepts are often more qualitative in nature, including such concepts as north, near and far (Hong 1994; Frank 1992). A qualitative GIS model that is capable of answering queries of the form, "Show me the road that goes around the Acadia National Park" will complement the existing models. The incorporation of natural-language spatial relations such as along, goes by, and through will give GIS users a greater choice in formulating queries, depending on the task being performed. By better accommodating the human requirements, qualitative models will also contribute toward the greater utilization of GIS technology. Among the potential applications of such a model are road navigation system, tourist databases, and evidence gathered from a security or judicial perspective.

In creating natural-language, spatial-query models, a cross-linguistic study will be of value as it can be used to gauge the versatility of the mathematical formalism utilized, as well as to celibrate the parameters of the mathematical formalism for different cultural and linguistic groups. Although GIS command languages are mostly written in English with the cognitive modals of native English speakers, the GIS users' community is mostly composed of groups of non-native English speakers (Campari 1994). As GIS database queries are often an expression of spatial relationships, it seems that "use errors will be reduced if the names chosen for such spatial predicated match the common sense meanings of those terms" (Mark and Egenhofer 1994). As such, calibrating the spatial predicates of a particular natural language against a formal mathematical model of spatial relations would allow us to hand a tool to GIS develops and enable them to design GIS databases that will be capable of answering spatial queries unambiguously in the particular language concerned.

2. Problem Statement
Formal models currently used in commercial GISs lack the parameters to accommodates the flexibility that natural language has in partitioning space. Thus, it is necessary to identify what parameters play a significant role in the selection of spatial predicates when people describe spatial relations, and what parameters do not influence the meaning of a particular term. Knowledge about these parameters will allow us to develop formal models as well as allow for their calibration to better fit human intuition.

3. Motivation
The primary motivation for this work stems from the fact that there is a need for a mathematical formation that is flexible enough to accommodate natural-language spatial queries that people may ask from a GIS. The cross-linguistic dimension of this problem is particularly challenging, as such a mathematical formalism could be used in the transfer of data sets across cultures, yet allowing for he correct determination of the semantics of the spatial relations.

Within the framework of a Global Information Infrastructure (GII), the results from this research can be used in building spatial data transfer standards at a local level. For example, a Spatial Data Transfer Standard for Malaysia requires an efficient link not only among databases within Malaysia, but also to the rest of the GII as it will allow users.

to exchange explicitly stored relations to save time and computer resources as users will receive a set of explicit spatial relationships and need not derive them again from the dataset.
to carry out consistency evaluation tests; for example, consistency tests can be carried out after the data transfer process to ensure that the interrelationships between the various individual spatial objects have been correctly transferred ; and
to consistently query different system based on a set of standardized spatial relations.

4. Goal and Hypothesis
Geographic database typically comprise digital, geometric representations of real-world entities. These entities may be represented as objects that consist of points, lines, and areas, whose relationships are described geometrically, for example by using concepts such as a point being the intersection of two lines or two points being located on opposite sides of a line. Spatial queries in a GIS, however, contain symbolic representations of spatial relations rather than detailed descriptions of the geometry. Examples of such symbolic representations are term like through or along. People more frequently describe a "road that goes along the park" rather than a "road that coincides with the park's boundary".

For creating formalisms of natural-language spatial relations there is a need to map the symbolic representations of real-world spatial relations onto a valid geometric configuration. Such a mapping would likely have a one-to-many relation between the symbols and geometry, because natural language has a limited set of words to match the infinite number of geometric configuration that can be constructed.

The goal of this research is to develop a former model that captures the essence of geometric configurations. The essential geometry that is captured should satisfy a particular natural-language query about a spatial relation and also have the capability of describing a particular geometric configuration within a database using the best natural-language description. In the first case, all geometric configurations within a database that fit the query description should qualify as the answers to the query. In the latter case, a user may be interested in finding the best natural-language description for a particular geometric configuration in a database, and the system should be capable of responding with a prototypical description of the particular geometric configuration.

Several other models for natural-language spatial relations have been defined based on linguistic, geometrical, and connectionist approaches; however, none addresses sufficiently the linkage to data models used in geographic information systems. The models used by linguists are primarily based upon introspection (Herskiovits 1986; Talmy 1983). Although they provide good research directions, that lack of mathematical objectivity in them precludes their use in information systems. Current GISs are based directly on these continuous measurers, then there would be an infinite number of terms, as continuous measures are subject to infinite variability. The direct usage of Cartesian coordinates as a basis for natural-language spatial descriptors is thus inappropriate. However, by discretizing the continuous measures, for example by determining the range of values for particular spatial terms, a mapping between the discretized values and spatial terms, and vice-versa, can be made. Such an approach will be beneficial to a GIS, because it can be easily implemented using current query tools such as SQL.

The connectionist model, which in principle attempts parallel computing using the human brain as a model, is another method that has been used to represent natural-language spatial relations (Regier 1995). This approach involves the training of the model using a very comprehensive set of geometrical configurations in order to achieve the desired level of results. Other formal models that have been proposed include the 9-intersection model (Egenhofer and Herring 1991), which is a purely topological model; models for cardinal directions (Frank 1992; Peuquet and Ci-Xiang 1987); and models for approximate distances (Hernandez et al. 1995, Hong et al. 1995).

Although the exact geometry in a set of configurations showing a road and a park, symbolized by a dark line and a polygon, respectively, may not be identical, people may use the same term to describe several distinct configurations (Figure 1). For example, the three configurations in Figure 1 have both end points of the line outside the polygon and at least some part of the line's interior intersects with the interior of the polygon. In essence, they have the same topology, and the same term through is used to describe all three configurations.

Figure 1 : Similar term through
On the other hand, different terms may be used to describe configurations even though the topology of these configurations is the same. For example, Figure 2a could be described by the term goes along as in, "The road goes along the park," while Figure 2b may be better referred to by, "The road is outside the park." Hence, topology cannot be the only criterion for determining the appropriate spatial terms. Metric distinctions may also be necessary.

Figure 2. Different terms (a) along and (b) outside
The hypothesis for this research is

"For the same topology, metric influences the choice of natural-language spatial terms." The hypothesis is proved by:

creating a formal framework based on the categorization of spatial relations as well as the identification of spatial parameters that can be tested;
calibrating the values of these spatial parameters for specific terms; and
analyzing the distribution of the material parameters for the spatial terms.

Using the 9-intersection model (Egenhofer and Herring 1994) as a foundation, a method for describing natural-language spatial relations is formulated. An elaborate definition of the parameters and the methodology for calibrating the values of the spatial parameters will be presented at the Conferece.

5. Scope of the Study
The GIS literature frequently cites three types of spatial relations as the most fundamental ones (Pullar and Egenhofer 1988; Wornboys 1992):

Topological relations are those spatial relations that are invariant under continuous transformations with continuous inverses.

Metric relations refer to distance relations which may be quantative if the distance has been measured or computed or qualitative if distance descriptors such as near or far are used.
Direction relations refer to azimuth (or bearing) which are quantitative, or to direction descriptors such as forward, right, or east, which are qualitative.

This research builds on strong evidence that among those spatial relations, topological properties are most fundamental (Lynch 1960; Kuipers 1978; Riesbeck 1980; Mark and Egenhofer 1994a), and that metric or direction properties are appropriate refinements of certain topological configuration: Topology matters, metric refines (Egenhofer and Mark 1995b). This research focuses on metric refinements of topological relations.

In order to allow for qualitative queries, the semantics of natural-language spatial elations must be defined and represented within a GIS. A variety of properties may contribute to the choice of a particular spatial term (Mark et al. 1995), such as the natural language (English vs. Spanish), the culture, the semantics of the spatial objects involved, the tasks users envision, the context in which the objects are presented, the pictorial presentation ( a sketch vs. a topographic map), and the objects' geometrics. We will concentrate here on the geometrical aspects. The approach taken here is a refinement of the 9-intersection model (Egenhofer and Herring 1991) to accommodate more semantics of natural-language spatial relations. The 9-intersection model itself has proved to be highly successful in the human subject testing conducted in the first round (Mark and Egenhofer 1992; mark and Egenhofer 1994a; Mark and Egenhofer 1994b; Mark and Egenhofer 1995). The refined model in this research will thus provide a framework for further testing in the second iteration of he human subject testing procedures.

Combinations of topological relations involve points, lines, and regions. The research on topological relations and spatial predicates in this research will be confined to line-region relationships only. For the purpose of carrying out the calibration of symbols, we focused on the road and park relationship (Mark and Egenhofer 1995).

6. Topics Excluded from the Present Investigation
Related issues that have been excluded from this investigations are:

Line-line and region-region relationships of the 9-intersection model have been excluded primarily to limit the scope of this work to line-region relations. The reasons are two fold: exploring the line-region relations aims to determine the basis principles of formalization and, secondly, the existence of human-subject data for the line-region relations aids the testing of the resulting formalism. The basic principles resulting form this investigation of line-region relations are expected to be extendible for line-line and line-region relationships, which will form a logical sequence for future work.
Orientations of the objects represented in a spatial configuration may have an influence on the choice of terms employed to describe the spatial relationships between these objects. Such orientations could play a role in refining further the semantics that are derived from mere metric information about the scene From a formal perspective, there is a need to formalize the role of orientations (Abdelmoty 1994). This will require separate treatment and is not addressed in this research.
The normal definition of spatial entities is not included in this work as such formal semantics (Rugg 1995) are provided for in current standards such as the Spatial Data Transfer Standard. Future work in this area is being carried out by the sub-committees of the Federal Geographic Data Committee.
Context of the configuration and tasks envisioned by users are factors that may influence the choice of spatial relationships. For example, Bangor airport is near the University of Maine if one had to board a plane. However, if one needed to buy groceries, then the shops in Bangor are far from Orono. However, we do not consider this issue in this research, because the scope of this research is focused primarily on the computational formal framework of spatial relationships.
The pictorial representations of configurations may have an effect on the determination of the best spatial description to be used. There is evidence that people use different entries to judge pictures and sentences instantiating the same sense (Herskiovits to appear). As this issue deals with the representation of the spatial entities themselves and not of spatial relationships, it is excluded form the investigation in this research.
This work is not concerned with cultural and linguistic aspects per se of spatial prepositions. We propose a formal framework for capturing the semantics of spatial relationships and validate our model by testing it with data obtained from experimental involving human subjects.

7. Major Results and Conclusions
The major finding of this research is that it is possible to create a formal model for representing natural-language spatial relations. The model identifies metric parameters that are a refinement of topology. The results revealed that these proposed parameters provide the discriminatory function in the grouping of the natural-language terms. For terms with the same topology, it was proven that metric influences the choice of natural-language spatial terms.

It was also found that when a parameter is significant for a spatial term in the prototypical test then for the agreement task (Mark 1996) there is a corresponding significant increase in agreement for that particular spatial term in describing a configuration in which this parameter is present. This observation was found to be true for the English data sets that were tested. As these findings are based on the analysis of two separate and distinct data sets, they thus prove that the results from the prototypical tests are not data dependent.

The formal model has the flexibility of having its parameters calibrated for particular spatial terms as well as domain of users. Based on the data available, the minimum and maximum extent of the parameter values for each group as well as for each natural-language spatial term used in this investigation was calibrated. These results were complied to create a Metric Table of Spatial Terms. This table is significant as it allows for the discrimination between spatial terms based on the topologic and metric parameters.

Bibliography

A. Abdelmoty (1995) Modelling and Reasoning in Spatial Databases: a Deductive Object-Oriented Approach Ph. D. thesis, Department of Computer Science, Heriot-Watt University, Edinburgh, Scotland.
I. Campari (1994) GIS Commands As Small Scale Space Terms: Cross-Cultural Conflict on their Spatial Content. In: T. Waugh and R. Healey (eds.), Sixth International Symposium on Spatial Data Handling, Edinburgh, Scotland, pp. 554-571.
M. Egenhofer and J. Herring (1991) Categorizing Binary Topological Relationships Between Regions, Lines, and Points in Geographic Databases. Department of Surveying Engineering, University of Maine. Orono, ME.
M. Egenhofer and D. Mark (1995b) Naïve Geography. In: A. Frank and W. Kuhn (eds)., Spatial Information Theory-A Theoretical Basis for GIS, International Conference COSIT'95. Semmering , Austria. Lecture Notes in Computer Science 988, pp. 1-15, Berlin: Springer-Verlag.
A. Frank (1992) Qualitative Reasoning about Distance and Directions in Geographic Space. Journal of Visual Languages and Computing 3(4): 343-371.
D. Hernandez, E. Clementini, and P. Di Felice (1995) Qualitative Distances. In A. Frank and W. Kuhn (eds.), Spatial Information Theory-A Theoretical Basis for GIS. International Conference COSIT '95 Semmering, Austria, Lecture Notes in Computer Science 988, pp. 45-58, Berlin: Springer-Verlag.
A. Herskovits (to appear) Language, Spatial Cognition, and Vision. In: O. Stock (ed.) Temporal and Spatial Reasoning, Dordrecht, The Netherlands : Kluwer.
A. Herskovits (1986) Language and Spatial Cognition: An Interdisciplinary Study of the Prepositions in English, Cambridge, MA: Cambridge University Press.
J. Hong (1994) Qualitative Distance and Direction Reasoning in Geographic Space. Ph. D. Thesis, Department of Spatial Information Science and Engineering, University of Maine, ME.
J. Hong, M. Egenhofer, and A. Frank (1995) On the Robustness of Qualitative Distance and Directions Reasoning in: D. Peuquet (ed.), Autocarto 12, Charlotte, NC, pp. 301-310.
B. Kuipers (1978) Modeling Spatial Knowledge Cognitive Science 2: 129-153.
K. Lynch (1960) The Image of a City, Cambridge, MA: MIT Press.
D. Mark (1996) Data collected by Mark (personal communication).
D. Mark, D. Comas, M. Egenhofer, S. Freundschuh, M. Gould, and J. Nunes (1995) Evaluating and Refining Computational Models of Spatial Relations Through Cross-Linguistic Human-Subject Testing in: A. Frank and W. Kuhn (edsd.), Spatial Information Theory-A Theoretical Basis for GIS, International Conference COSIT '95. Semmering, Austria. Lecture Notes in Computer Science 988, pp. 553-568, Berlin: Springer-Verlag.
D. Mark and M. Egenhofer (1992) An Evaluation of the 9-intersection for Region-Line Relations. In :GIS/LIS '92, San Jose, CA , pp. 513-521.
D. Mark and M. Egenhofer (1994a) Calibrating the Meaning of Spatial Predicates from Natural Language: Line-Region Relations. In: T. Waugh and R. Healey (eds.), Sixth International Symposium on Spatial Data Handling, Edinburgh, Scotland, pp. 538-553.
D. Mark and M. Egenhofer (1995) Topology of Prototypical Spatial Relations Between Lines and Regions in English and Spanish. In: D. Pseuquet (ed.), Autocarto 12, Charlotte, NC, pp. 245-254.
D. Peuquet and Z. Ci-Xiang (1987) An algorithm to determine the directional relationship between arbitrarily shaped polygons in the plane. Pattern Recognition 20(1): 65-74.
Pullar and M. Egenhofer (1988) Towards Formal Definitions of Topological Relations Among Spatial Objects in: D. Marble (ed.), Third International Symposium on Spatial Data Handling, Sydney, Australia, pp. 225-242.
T. Regier (1995) A model of the human capacity for categorizing spatial relations, Cognitive Linguistics 6(1): 63-88.
C. Riesbeck (1980) "You Can't miss It" Judging the Clarity of Directions. Cognitive Science 4: 285-303.
R. Rugg (1995) Extending the SDTS Model of Features and Attributes, Association of American Geographers: 91st. Annual meeting, Chicago, IL, p. 266.
L. Talmy (1983) How Language Structures Space. In: H. Pick and L. Acredolo (eds), Spatial Orientation, New York: Plenium Press, pp. 225-282.
M. worboys (1992) A geometric Model for Planar Geographical Objects. International Journal of Geographical Information Systems 6(5): 353-372.